Nucleic acids and proteins with growth hormone activity

ABSTRACT

The invention relates to novel growth hormone activity (GHA) proteins and nucleic acids. The invention further relates to the use of the GHA proteins in the treatment of growth hormone related disorders.

This application is a continuing application of U.S. Ser. No.60/133,784, filed May 12, 1999.

FIELD OF THE INVENTION

The invention relates to novel growth hormone activity (GHA) proteinsand nucleic acids. The invention further relates to the use of the GHAproteins in the treatment of growth hormone (hGH) related disorders.

BACKGROUND OF THE INVENTION

Human growth hormone (hGH), also known as somatotropin, is a singlechain polypeptide hormone of 191 amino acids (molecular weight of app.22 kD) that is synthesized in the somatotropic cells of the anteriorpituitary and plays an important role in somatic growth through itseffects on the metabolism of proteins, carbohydrates, and lipids. hGH isa member of a family of homologous hormones that also includes placentallactogens and prolactins [Nicoll et al., Endocr. Rev. 7(2):169-203(1986)]. Several distinct biological activities have been ascribed tohGH, including effects on (i) linear growth (somatogenesis), (ii)lactation, (iii) activation of macrophages, and (iv) insulin-like anddiabetogenic effects [Chawla et al., Annu. Rev. Med. 34:519-47 (1983);Edwards et al., Science 239(4841 Pt1):769-71 (1988); Thorner and Vance,J. Clin. Invest. 82(3):745-7 (1988)]. These biological effects derivefrom the interaction between hGH and specific cellular receptors, suchas the growth hormone receptor [Leung et al., Nature 330 (6148):537-43(1987)] or the prolactin receptor [Boutin et al., Cell 53(1):69-77(1988)].

The binding of a single growth hormone (GH) molecule to a pair of GHreceptors (GHR) induces receptor dimerization, promotes the rapidassociation of GHR with the tyrosine kinase JAK2 and activates aphosphorylation cascade involving the initial activation of thereceptor-associated kinase JAK2. This results in the tyrosylphosphorylation of the kinase itself and of the cytoplasmic domain ofthe receptor. The phosphorylated tyrosine residues act as docking sitesfor various signaling molecules that contain Src homology 2 (SH-2) orother phosphorysyl-binding domains. Among these are the STAT proteins 1,3 and 5 (signal transducers and activators of transcription), theinsulin receptor substrates (IRS) 1 and 2, which are believed to mediatesome of the metabolic effects of GH and the adaptor protein Shc, leadingto the activation of the Ras/MAP kinase pathway, the second messengerssuch as diacylglycerol, calcium, and nitic oxide. Ultimately, thesepathways modulate cellular functions such as gene transcription,metabolite transport, and enzymatic activities that affect gH-dependentcontrol of growth and metabolism. Activation by GH is very transient andseveral mechanisms are involved in this downregulation: internalizationand degradation of the receptor and recruitment of phosphatases or ofspecific inhibitors of the JAK/Stat pathway, the SOCS proteins [forreview, see Finidori, Vitam. Horm. 59:71-97 (2000); Carter-Su et al.,Endocr. J. 43 Suppl:S65-70 (1996); Cambell, J. Pediatr. 131 (1Pt2):S42-4 (1997)].

GH can be isolated from human pituitary glands or can be preparedrecombinantly. There are two commercially available forms of thegenetically engineered hormone, one of which is identical in amino acidsequence to the naturally occurring human growth hormone. The otherform, isolated from a prokaryotic cell, has an additional methionineresidue at the N-terminus of the protein. Recombinant forms of hGH havebeen available since 1993 for the long term treatment of children whohave growth failure due to lack of adequate endogenous growth hormonesecretion. The product is currently administered by either intramuscularor subcutaneous injection and stored in the refrigerator at 2-8° C.

GH has been either reported to have a role in, or suggested for therapyin or has shown efficacy in the treatment of (i) hypochondroplasia andidiopathic short stature [Ramaswami et al., Acta Paediatr. Suppl.88(428):116-7 (1999); Kamp and Wit, Horm. Res. 49 Suppl. 2:67-72(1998)]; (ii) girls with Turner syndrome [de Muinck Keizer-Schrama andSas, Acta Paediatr. Suppl. 88)433): 126-9 (1999); Haeusler, Horm. Res.49 Suppl. 2:62-6 (1998)]; (iii) growth delay in burned children [Low etal., Lancet 354(9192):1789 (1999); Hemdon et al., Horm. Res. 45 Suppl.1:29-31 (1996)]; (iv) GH replacement in GH deficient adults [Bengtssonet al., J. Clin. Endocrinol. Metab. 85(3):933-42 (2000); Cook et al.,Adv. Intern. med. 45:297-315 (2000); Welle, Curr. Opin. Clin. Nutr.Metab. Care 1(3):257-62 (1998); Abs et al., Clin. Endocrinol (Oxf)50(6):703-13 (1999); Clark and Kendall, J. Clin. Pharm. Ther.21(6):367-72 (1996)]; (v) muscle wasting under conditions, includingsurgical stress, renal failure, muscular dystrophy, glucocorticoidadministration and HIV infection [Welle, supra; Windisch et al., Ann.Pharmacother. 32(4):437-45 (1998); Mentser et al., J. Pediatr. 131(1 Pt2):S20-4 (1997); Hirschfeld, Horm. Res. 46(4-5):215-21 (1996)]; (vi)congestive heart failure and cardiovascular drug therapy [Cittadini etal., Miner. Electrolyte Metab. 25(102):51-5 (1999); Johnson andGheorghiade, Am Heart J. 137(6):989-91 (1999); Sacca, Baillieres Clin.Endocrinol. Metab. 12(2):217-31; Gomberg-Maitland and Frishman, Am:Heart J. 132(6):1244-62 (1996)]; (vii) bone diseases and osteoporosis[Tanaka, Endocr. 45 Suppl:S47-52 (1998); Reginster et al., Drugs R. D.1(3):195-201 (1999)]; (viii) puberty and reproduction [Sharara andGiudice, J. Soc, Gynecol. Investig. 4(1):2-7 (1997); Artini et al., J.Endocrinol. Invest. 19(11):763-79 (1996); Homburg, Horm. Res.45(1-2):81-5 (1996); Homburg and Farhi, Curr. Opin. Obstet Gynecol.7(3):220-3 (1995); Homburg and Ostergaard, Hum. Reprod. Update1(3):264-75 (1995)]; (ix) GH therapy in elderly people [Bouillanne etal., Fundam. Clin. Pharmacol. 10(5):416-30 (1996)]; (x) wound management[Rasmussen, Dan. Med. Bull. 42(4):358-70 91995)]; (xi) breast cancer[Wennbo and Tomell, Oncogene 19(8):1072-6 (2000)]; (xii) Prader-Willisyndrome [Ritzen et al., J. Pediatr. Endocrinol. Metabol. 12 Suppl.1:345-9 (1999); Nagai and Mori, Biomed. pharmacother. 53(10):452-4(1999)]; (xiii) immune reconstitution [Chappel, J. Acquir. Immune Defic.Sundr. Hum. Retrovirol. 20(5):423-31 (1999)]; (xiv) obesity [Scacchi etal., Int. J. Obes. relat. Metab. disord. 23(3):260-71 (1999)]; and (xv)Russell-Silver syndrome [Stanhope et al., Horm. Res. 49 Suppl. 2:37-40(1998)]. For further reviews on GH therapies, see Tritos and Mantzoros,Am. J. Med. 105(1):44-57 (1998); Vance, Trans. Am. Clin. Climatol.Assoc. 109:87-96 (1998); Marcus and Hoffman, Annu. Rev. Pharmacol.Toxicol. 38:45-61 (1998).

hGH is marketed under the names NUTROPIN™ or PROTROPINT™ (Genentech,Inc.), NORDOTROPIN™ (Novo Nordisk), GENOTROPIN™ (Pharmacia Upjohn),HUMATROPE™ (Eli Lilly) and SAIZEN™ or SEROSTIM™ (Serono). FDA approvalis fortreatment of GH deficiency and Turner's syndrome.

To this end, variants of hGH sequences, applications and productionprocedures are known; see for example U.S. Pat. Nos. 4,658,021,4,665,160, 5,068,317, 5,079,345, 5, 424,199, 5,534,617, 5,597,709,5,612,315, 5,633,352, 5,635,604, 5,688,666 and references cited therein.

Recently, the crystal structures of wild type hGH in a 1:2 complex withits receptor was solved at 2.8 Å resolution (de Vos et al., Science255(5042):306-12 (1992); hereby expressly incorporated by reference. Thestructure of this complex is deposited as 3HHR entry in the BrookhavenProtein Data Bank (PDB). The crystal structure confirmed that thecomplex consists of one molecule of growth hormone per two molecules ofreceptor. The hormone is a four-helix bundle motif characterized by thefirst two helices running parallel to each other but antiparallel to thelast two. In addition to the structure of the wild type hGH (3HHR and1HGU entries in the PDB), there are five crystal structures of mutantforms of hGH available in the literature and the PDB: 1HUW, 1AX1, 1A22,1HWH, and 1HWG, hereby expressly incorporated by reference.

1HUW. PDB entry 1huw [Ultsch et al., Science 236(1): 286-299 (1994)]contains a structure (2.0 Å resolution) of a variant hGH, in which 15mutations (F10A, M14W, H18D, H21N, K41I, Y42H, L45W, Q46W, F54P, R64K,R167N, D171S, E174S, F176Y, AND I179) were introduced with phage displaymutagenesis to improve receptor binding affinity by 400-fold.

1AX1. PDB entry 1axi [Atwell et al., Science 278:1125-1128(1997)]contains a structure of a complex of a mutant of hGH (G120R, K168R,D171T, K172Y, E174A, and F176Y) with its receptor mutated at position104: W104A. The resolution is 2.1 Å.

1A22. The PDB entry 1a22 [Clackson et al., J. Mol. Biol. 277:1111-1128(1998)] is a structure of the 1:1 G120R growth hormone mutantreceptorcomplex at 2.6 Å resolution. The designed G120R mutant is an antagonistand can bind only one molecule of the GHR. 1HWH. The PDB entry 1 hwh[Sundstrom et al., J. Biol. Chem. 271(50):32197-203 (1996)] is a crystalstructure of a growth hormone antagonist mutant G120R with its receptoras a 1:1 complex at 2.9 Å resolution. The 1:1 complex is remarkablysimilar to the native growth hormone receptor 1:2 complex. A comparisonbetween the two structures reveals only minimal differences in theconformations of the hormone or its receptor in the two complexes,including the angle between the two immunogl;obulin-like domains of thereceptor.

1 HWG. The PDB entry 1hwg (Sundstrom et al., supra) contains a crystalstructure of an antagonist mutant G120R of human growth hormone in 1:2complex with its receptor at 2.5 Å resolution. important differencebetween this structure and the previously published crystal structure at2.8 Å resolution (3HHR) is revealed. Trp-104 in the receptor, a keyresidue in the hormone-receptor interaction has an altered conformationin the low affinity site enabling a favorable hydrogen bond to be formedwith Asp-116 of the hormone.

The available crystal structure of hGH allows further protein design andthe generation of more stable proteins or protein variants with analtered activity. Several groups have applied and experimentally testedsystematic, quantitative methods to protein design with the goal ofdeveloping general design algorithms (Hellinga et al., J. Mol. Biol.222: 763-785 (1991); Hurley et al., J. Mol. Biol. 224:1143-1154 (1992);Desjarlaisl et al., Protein Science 4:2006-2018 (1995); Harbury et al.,Proc. Natl. Acad. Sci. U.S.A. 92:8408-8412 (1995); Klemba et al., Nat.Struc. Biol. 2:368-373 (1995); Nautiyal et al., Biochemistry34:11645-11651 (1995); Betzo et al., Biochemistry 35:6955-6962 (1996);Dahiyat et al., Protein Science 5:895-903 (1996); Dahiyat et al.,Science 278:82-87 (1997); Dahiyat et al., J. Mol. Biol. 273:789-96;Dahiyat et al., Protein Sci. 6:1333-1337 (1997); Jones, Protein Science3:567-574 (1994); Konoi, et al., Proteins: Structure, Function andGenetics 19:244-255 (1994)). These algorithms consider the spatialpositioning and steric complementarity of side chains by explicitlymodeling the atoms of sequences under consideration. In particular,WO98147089, and U.S. Ser. No. 09/127,926 describe a system for proteindesign, both are expressly incorporated by reference.

A need still exists for proteins exhibiting both significant stabilityand growth hormone activity. Achievement of better stability willimprove the convenience for shipment, storage and patient use of thisproduct. Accordingly, it is an object of the invention to provide growthhormone activity (GHA) proteins with a higher thermostability than thenaturally occurring hormone, nucleic acids and antibodies for thetreatment of hGH related disorders.

SUMMARY OF THE INVENTION

In accordance with the objects outlined above, the present inventionprovides non-naturally occurring growth hormone activity (GHA) proteins(e.g. the proteins are not found in nature) comprising amino acidsequences that are less than about 97% identical to human growth hormone(hGH). The GHA proteins have at least one altered biological property ofhGH protein; for example, some GHA proteins will be more stable than hGHand bind to a cell comprising a growth hormone receptor (GHR) orprolactin receptor. Thus the invention provides GHA with amino acidsequences that have at least about 5 amino acid substitutions ascompared to the hGH sequence shown in FIG. 1A (SEQ ID NO:1).

In a further aspect, the present invention provides non-naturallyoccurring HGA conformers that have three dimensional backbone structuresthat substantially correspond to the three dimensional backbonestructure of hGH. The amino acid sequence of the GHA conformer and theamino acid sequence of hGH are less than about 97% identical. In oneaspect at least about 90% of the non-identical amino acids are in a coreregion of the conformer. In other aspects, the conformer have at leastabout 100% of the non-identical amino acids are in a core region of theconformer.

In an additional aspect, the changes are selected from the amino acidresidues at positions selected from positions 6, 7, 13, 14, 17, 20, 26,27, 28, 29, 30, 31, 34, 35, 36, 40, 43, 50, 54, 55, 56, 57, 58, 59, 70,71, 73, 74, 75, 76, 77, 78, 79, 81, 82, 83, 84, 85, 87, 90, 92, 97, 98,100, 102, 106, 107, 109, 110, 111, 113, 114, 115, 117, 118, 121, 125,130, 132, 137, 139, 141, 142, 143, 145, 156, 157, 158, 159, 161, 162,163, 166, 170, 173, 176, 177, 183, 184, 185, and 188. In a preferredaspect, the changes are selected from the amino acid residues atpositions selected from positions 13, 27, 28, 54, 55, 79, 85, 90, 114,161, or 184. In one aspect, the changes are selected from the amino acidresidues at positions selected from positions 14, 26, 30, 34, 35, 40,50, 57, 59, 71, 84, 92, 107, 109, 118, 125, 130, 139, 143, 158, or 183.In another aspect, the changes are selected from the amino acid residuesat positions selected from positions 7, 29, 43, 77, 98, 100, 106, 111,132, 137, 141, 142, 159, 161, 184, or 188. In another aspect, thechanges are selected from the amino acid residues at positions selectedfrom positions 26, 29, 30, 34, 40, 43, 50, 77, 84, 92, 100, 102, 109,111, 118, 125, 132, 135, 137, 138, 141, 142, 143, 144, 145, or 147.Preferred embodiments include at least about 5 variations.

In a further aspect, the invention provides recombinant nucleic acidsencoding the non-naturally occurring GHA proteins, expression vectorscomprising the recombinant nucleic acids, and host cells comprising therecombinant nucleic acids and expression vectors.

In an additional aspect, the invention provides methods of producing theGHA proteins of the invention comprising culturing host cells comprisingthe recombinant nucleic acids under conditions suitable for expressionof the nucleic acids. The proteins may optionally be recovered. In afurther aspect, the invention provides pharmaceutical compositionscomprising an GHA protein of the invention and a pharmaceutical carrier.

In an additional aspect, the invention provides methods for treating aGH responsive condition comprising administering a GHA protein of theinvention to a patient. The GH condition may be hypochondroplasia oridiopathic short structure; Turner's syndrome; growth delay in burnedchildren; muscle wasting under conditions, including, but not limited tosurgical stress, renal failure, muscular dystrophy, glucocorticoidadministration or HIV infection; congestive heart failure orcardiovascular drug therapy; bone diseases or osteoporosis; disordersaffecting puberty or reproduction; diffuse gastric bleeding; disordersrelating to general anabolism, including, but not limited topseudoarthrosis, burn therapy, old age cachetic states; breast cancer;Prader-Willi syndrome; obesity; and Russell-Silver syndrome.

In an additional aspect, the invention provides GHA proteins for use inGH replacement therapies in GH deficient adults; GH therapy in elderlypeople; wound healing, including but not limited to stasis ulcers,decubitus ulcers, or diabetic ulcers; post-surgical (trauma) healingprocess; total parenteral nutrition (TPN); and the reconstitution of theimmune system.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A (SEQ ID NO:1) depicts the amino acid sequence of the human hGHas deposited at SWISS-PROT #P01241, somatotropin precursor. Amino acidresidues 1-26 correspond to the signal peptide and amino acid residues27-217 correspond to the mature protein.

FIG. 1B (SEQ ID NO:14) depicts the amino acid sequence of hGH as used inthe determination of the crystal structure of hGH with its receptor [PDBand GenBank # 3HHR; de Vos et al., Science 255(5042):306-12 (1992)] andsecondary structure elements. Secondary structure element legend: H,alpha helix (4-helix); B, residue in isolated beta bridge; E, extendedstrand, participates in beta ladder; G, 310 helix (3-helix); I, pi helix(5-helix); T, hydrogen bonded turn; S, bend. Amino acid residues 1 to190 of FIG. 1B (SEQ ID NO:14) correspond to amino acid residues 27-216of the amino acid sequence depicted in FIG. 1A (SEQ ID NO:1). The aminoacid numbers shown were used as the amino acid numbers in GHA proteinsthat also include F191.

FIG. 1C (SEQ ID NO:2) depicts the complete DNA sequence encoding wildtype human growth hormone (Roskam and Rougeon, Nucleic Acids Res.7(2):305-20 (1979); Martial et al., Science 205(4406):602-7 (1979);GenBank accession number V00519; similar sequences are deposited under#A12770, M13438, and J03071). The encoded sequence consists of thesignaling sequence, MATGSRTSLLLAFGLLCLPWLQEGSA (residues -26 to -1 ofSEQ ID NO:1), and the 191 amino acids that constitute the actual protein(see FIGS. 1A and 1B)(SEQ ID NO:1). The DNA sequence of 799 nucleotidesincludes the coding sequence (bases 41 to 694) and untranslatedsequences.

FIG. 2 depicts the structure of wild type hGH as taken from the PDBentry 3HHR.

FIG. 3A depicts the CORE residues (SEQ ID NO:14). FIG. 3B depicts theBOUNDARY1 residues (SEQ ID NO:14). FIG. 3C depicts the BOUNDARY2residues (SEQ ID NO:14). FIG. 3D depicts the CLUSTERED BOUNDARY residues(SEQ ID NO:14). These are selected for PDA. The individual sets aredescribed in detail herein.

FIG. 4A depicts the mutation pattern of the CORE sequences of hGH basedon the analysis of the lowest 1000 protein sequences generated by MonteCarlo analysis the CORE sequence (only the amino acid residues ofpositions 6, 10, 13, 17, 20, 24, 27, 28, 31, 36, 44, 54, 55, 58, 73, 75,76, 78, 79, 80, 81, 82, 83, 85, 90, 93, 96, 97, 105, 110, 114, 117, 121,124, 157, 161, 162, 163, 166, 170, 173, 176, 177, 180, and 184 aregiven). The numbers following each amino acid indicate how often withinthe 1000 sequences analyzed, the indicated amino acid residue was found.For example, at position 13, the hGH amino acid is alanine (see FIG. 1A)(SEQ ID NO:1); in GHA proteins, 943 of the top 1000 sequences had valineat this position, and 57 of the sequences had isoleucine. None of thesequences had alanine at this position. Similarly, for position 90(valine in hGH), isoleucine (702) is preferred over valine (294).

FIGS. 4B to 4E (SEQ ID NO:3-6) depict preferred GHA protein sequencesbased on the PDA analysis the hGH CORE sequence. Amino acid residuesdifferent from the hGH (see FIG. 1B)(SEQ ID NO:14) are underlined andshown in bold.

FIG. 5A depicts the mutation pattern of the BOUNDARY1 sequences of hGHbased on the analysis of the lowest 1000 protein sequences generated byMonte Carlo analysis the BOUNDARY1 sequence (only the amino acidresidues of positions 6, 14, 26, 30, 32, 34, 35, 40, 50, 56, 57, 59, 66,71, 74, 84, 92, 107, 109, 113, 118, 125, 130, 139, 143, 157, 158, and183 are given). The numbers following each amino acid indicate how oftenwithin the 1000 sequences analyzed, the indicated amino acid residue wasfound. For example, at position 118, the hGH amino acid is glutamic acid(see FIG. 1B)(SEQ ID NO:14); in GHA proteins, all 1000 sequences hadleucine at this position. Similarly, for position 14 (methionine inhGH), leucine (945) is preferred over methionine (26).

FIGS. 5B and 5C (SEQ ID NOS:7-8)depict preferred GHA protein sequencesbased on the PDA analysis the hGH BOUNDARY1 sequence. Amino acidresidues different from the hGH (see FIG. 1B)(SEQ ID NO:14) areunderlined and shown in bold.

FIG. 6A depicts the mutation pattern of the BOUNDARY2 sequences of hGHbased on the analysis of the lowest 1000 protein sequences generated byMonte Carlo analysis the BOUNDARY2 sequence (only the amino acidresidues of positions 7, 29, 43, 70, 77, 87, 98, 100, 102, 104, 106,111, 115, 132, 137, 140, 141, 142, 156, 159, 161, 184, 185, and 188 aregiven). The numbers following each amino acid indicate how often withinthe 1000 sequences analyzed, the indicated amino acid residue was found.For example, at position 142, the hGH amino acid is threonine (see FIG.1B) (SEQ ID NO:14); in GHA proteins, all 1000 sequences had valine atthis position. Similarly, for position 7 (serine in hGH), lysine (873),tyrosine (43), arginine (26), phenylalanine (22), leucin (18), andvaline (18) are preferred over serine.

FIG. 6B (SEQ ID NO:9) depicts a preferred GHA protein sequences based onthe PDA analysis the hGH BOUNDARY2 sequence. Amino acid residuesdifferent from the hGH (see FIG. 1B)(SEQ ID NO:14) are underlined andshown in bold.

FIG. 7A depicts the mutation pattern of the CLUSTERED BOUNDARY sequencesof hGH based on the analysis of the lowest 1000 protein sequencesgenerated by Monte Carlo analysis the CLUSTERED BOUNDARY sequence (onlythe amino acid residues of positions 7, 14, 26, 29, 30, 34, 40, 43, 50,57, 70, 77, 84, 87, 92, 98, 100, 102, 104, 106, 109, 111, 115, 118, 125,132, 135, 137, 138, 140, 141, 142, 143, 144, 145, 147, 156, 159, 161,184, 185, and 188 are given). The numbers following each amino acidindicate how often within the 1000 sequences analyzed, the indicatedamino acid residue was found. For example, at position 50, the hGH aminoacid is threonine (see FIG. 1B)(SEQ ID NO:14); in GHA proteins, 950 ofthe top 1000 sequences had phenylalanine at this position, and 50 of thesequences had methionine. None of the sequences had threonine at thisposition. Similarly, for position 102 (valine in hGH), isoleucine (962)is preferred over valine (38).

FIGS. 7B to 7E (SEQ ID NOS:10-13) depict preferred GHA protein sequencesbased on the PDA analysis the hGH CLUSTERED BOUNDARY sequence. Aminoacid residues different from the hGH (see FIG. 1B) (SEQ ID NO:14) areunderlined and shown in bold.

FIG. 8 depicts the synthesis of a full-length gene and all possiblemutations by PCR. Overlapping oligonucleotides corresponding to thefull-length gene (black bar, Step 1) and comprising one or more desiredmutations are synthesized, heated and annealed. Addition of DNApolymerase to the annealed oligonucleotides results in the 5′ to 3′synthesis of DNA (Step 2) to produce longer DNA fragments (Step 3).Repeated cycles of heating, annealing, and DNA synthesis (Step 4) resultin the production of longer DNA, including some full-length molecules.These can be selected by a second round of PCR using primers (indicatedby arrows) corresponding to the end of the full-length gene (Step 5).

FIG. 9 depicts a preferred scheme for synthesizing an IbA library of theinvention. The wild type gene, or any starting gene, such as the genefor the global minima gene, can be used.

Oligonucleotides comprising sequences that encode different amino acidsat the different variant positions (indicated in the Figure by box 1,box 2, and box 3) can be used during PCR. Those primers can be used incombination with standard primers. This generally requires feweroligonucleotides and can result in fewer errors.

FIG. 10 depicts an overlapping extension method. At the top of FIG. 10Ais the template DNA showing the locations of the regions to be mutated(black boxes) and the binding sites of the relevant primers (arrows).The primers R1 and R2 represent a pool of primers, each containing adifferent mutation; as described herein, this may be done usingdifferent ratios of primers if desired. The variant position is flankedby regions of homology sufficient to get hybridization. Thus, as shownin this example, oligos R1 and F2 comprise a region of homology and sodo oligos R2 and F3. In this example, three separate PCR reactions aredone for step 1. The first reaction contains the template plus oligos F1and R1. The second reaction contains template plus oligos F2 and R2, andthe third contains the template and oligos F3 and R3. The reactionproducts are shown. In Step 2, the products from Step 1 tube 1 and Step1 tube 2 are taken. After purification away from the primers, these areadded to a fresh PCR reaction together with F1 and R4. During thedenaturation phase of the PCR, the overlapping regions anneal and thesecond strand is synthesized. The product is then amplified by theoutside primers, F1 and R4. In Step 3, the purified product from Step 2is used in a third PCR reaction, together with the product of Step 1,tube 3 and the primers F1 and R3. The final product corresponds to thefull length gene and contains the required mutations. Alternatively,Step 2 and Step 3 can be performed in one PCR reaction.

FIG. 11 depicts a ligation of PCR reaction products to synthesize thelibraries of the invention. In this technique, the primers also containan endonuclease restriction site (RE), either generating blunt ends, 5′overhanging ends or 3′ overhanging ends. We set up three separate PCRreactions for Step 1. The first reaction contains the template plusoligos F1 and R1. The second reaction contains template plus oligos F2and R2, and the third contains the template and oligos F3 and R3. Thereaction products are shown. In Step 2, the products of Step 1 arepurified and then digested with the appropriate restrictionendonuclease. The digestion products from Step 2, tube 1 and Step 2,tube 2 are ligated together with DNA ligase (Step 3). The products arethen amplified in Step 4 using oligos F1 and R4. The whole process isthen repeated by digesting the amplified products, ligating them to thedigested products of Step 2, tube 3, and then amplifying the finalproduct using oligos F1 and R3. It would also be possible to ligate allthree PCR products from Step 1 together in one reaction, providing thetwo restriction sites (RE1 and RE2) were different.

FIG. 12 depicts blunt end ligation of PCR products. In this technique,oligos such as F2 and R1 or R2 and F3 do not overlap, but they abut.Again three separate PCR reactions are performed. The products from tube1 and tube 2 (see FIG. 11, Step 1) are ligated, and then amplified withoutside primers F1 and R4. This product is then ligated with the productfrom Step 1, tube 3. The final products are then amplified with primersF1 and R3.

FIG. 13 depicts thermal denaturation (CD spectroscopy) curves for wildtype hGH and GHA mutants b and d. Mutants b and d show an increase inthermostability over hGH of 16° C. and of 13° C., respectively.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is directed to novel proteins and nucleic acidspossessing growth hormone activity (sometimes referred to herein as “GHAproteins” and “GHA nucleic acids”). The proteins are generated using asystem previously described in WO98/47089 and U.S. Ser. Nos. 09/058,459,09/127,926, 60/104,612, 60/158,700, 09/419,351, 60/181,630, 60/186,904,U.S. patent application, entitled Protein Design Automation For ProteinLibraries (Filed: Apr. 14, 2000; Inventor: Bassil Dahiyat), and PCTUS98/07254, all of which are expressly incorporated by reference intheir entirety, that is a computational modeling system that allows thegeneration of extremely stable proteins without necessarily disturbingthe biological functions of the protein itself. In this way, novel GHAproteins and nucleic acids are generated, that can have a plurality ofmutations in comparison to the wild-type enzyme yet retain significantactivity.

Generally, there are a variety of computational methods that can be usedto generate the GHA proteins of the invention. In a preferredembodiment, sequence based methods are used. Alternatively, structurebased methods, such as PDA, described in detail below, are used.

Similarly, molecular dynamics calculations can be used tocomputationally screen sequences by individually calculating mutantsequence scores and compiling a rank ordered list.

In a preferred embodiment, residue pair potentials can be used to scoresequences (Miyazawa et al., Macromolecules 18(3):534-552 (1985),expressly incorporated by reference) during computational screening.

In a preferred embodiment, sequence profile scores (Bowie et al.,Science 253(5016):164-70 (1991), incorporated by reference) and/orpotentials of mean force (Hendlich et al., J. Mol. Biol. 216(1):167-180(1990), also incorporated by reference) can also be calculated to scoresequences. These methods assess the match between a sequence and a 3Dprotein structure and hence can act to screen for fidelity to theprotein structure. By using different scoring functions to ranksequences, different regions of sequence space can be sampled in thecomputational screen.

Furthermore, scoring functions can be used to screen for sequences thatwould create metal or co-factor binding sites in the protein (Hellinga,Fold Des. 3(1):R1-8 (1998), hereby expressly incorporated by reference).Similarly, scoring functions can be used to screen for sequences thatwould create disulfide bonds in the protein. These potentials attempt tospecifically modify a protein structure to introduce a new structuralmotif.

In a preferred embodiment, sequence and/or structural alignment programscan be used to generate the GHA proteins of the invention. As is knownin the art, there are a number of sequence-based alignment programs;including for example, Smith-Waterman searches, Needleman-Wunsch, DoubleAffine Smith-Waterman, frame search, Gribskov/GCG profile search,Gribskov/GCG profile scan, profile frame search, Bucher generalizedprofiles, Hidden Markov models, Hframe, Double Frame, Blast, Psi-Blast,Clustal, and GeneWise.

As is known in the art, there are a number of sequence alignmentmethodologies that can be used. For example, sequence homology basedalignment methods can be used to create sequence alignments of proteinsrelated to the target structure (Altschul et al., J. Mol. Biol.215(3):403-410 (1990), Altschul et al., Nucleic Acids Res. 25:3389-3402(1997), both incorporated by reference). These sequence alignments arethen examined to determine the observed sequence variations. Thesesequence variations are tabulated to define a set of GHA proteins.

Sequence based alignments can be used in a variety of ways. For example,a number of related proteins can be aligned, as is known in the art, andthe “variable” and “conserved” residues defined; that is, the residuesthat vary or remain identical between the family members can be defined.These results can be used to generate a probability table, as outlinedbelow. Similarly, these sequence variations can be tabulated and asecondary library defined from them as defined below. Alternatively, theallowed sequence variations can be used to define the amino acidsconsidered at each position during the computational screening. Anothervariation is to bias the score for amino acids that occur in thesequence alignment, thereby increasing the likelihood that they arefound during computational screening but still allowing consideration ofother amino acids. This bias would result in a focused library of GHAproteins but would not eliminate from consideration amino acids notfound in the alignment. In addition, a number of other types of bias maybe introduced. For example, diversity may be forced; that is, a“conserved” residue is chosen and altered to force diversity on theprotein and thus sample a greater portion of the sequence space.Alternatively, the positions of high variability between family members(i.e. low conservation) can be randomized, either using all or a subsetof amino acids. Similarly, outlier residues, either positional outliersor side chain outliers, may be eliminated.

Similarly, structural alignment of structurally related proteins can bedone to generate sequence alignments (Orengo et al., Structure 5(8):1093-108 (1997); Holm et al., Nucleic Acids Res. 26(1):316-9 (1998),both of which are incorporated by reference). These sequence alignmentscan then be examined to determine the observed sequence variations.Libraries can be generated by predicting secondary structure fromsequence, and then selecting sequences that are compatible with thepredicted secondary structure. There are a number of secondary structureprediction methods such as helix-coil transition theory (Munoz andSerrano, Biopolymers 41:495, 1997), neural networks, local structurealignment and others (e.g., see in Selbig et al., Bioinformatics15:1039-46, 1999).

Similarly, as outlined above, other computational methods are known,including, but not limited to, sequence profiling [Bowie and Eisenberg,Science 253(5016): 164-70, (1991)], rotamer library selections [Dahiyatand Mayo, Protein Sci. 5(5):895-903 (1996); Dahiyat and Mayo, Science278(5335):82-7 (1997); Desjarlais and Handel, Protein Science4:2006-2018 (1995); Harbury et al, Proc. Natl. Acad. Sci. U.S.A.92(18):8408-8412 (1995); Kono et al., Proteins: Structure, Function andGenetics 19:244-255 (1994); Hellinga and Richards, Proc. Natl. Acad.Sci. U.S.A. 91:5803-5807 (1994)]; and residue pair potentials [Jones,Protein Science 3: 567-574, (1994)]; PROSA [Heindlich et al., J. Mol.Biol. 216:167-180 (1990)]; THREADER [Jones et al., Nature 358:86-89(1992)], and other inverse folding methods such as those described bySimons et al. [Proteins, 34:535-543, (1999)], Levitt and Gerstein [Proc.Natl. Acad. Sci. U.S.A., 95:5913-5920, (1998)], Godzik and Skolnick[Proc. Natl. Acad. Sci. U.S.A., 89:12098-102, (1992)], Godzik et al. [J.Mol. Biol. 227:227-38, (1992)] and two profile methods [Gribskov et al.Proc. Natl. Acad. Sci. U.S.A. 84:4355-4358 (1987) and Fischer andEisenberg, Protein Sci. 5:947-955 (1996), Rice and Eisenberg J. Mol.Biol. 267:1026-1038(1997)], all of which are expressly incorporated byreference. In addition, other computational methods such as thosedescribed by Koehl and Levitt (J. Mol. Biol. 293:1161-1181 (1999); J.Mol. Biol. 293:1183-1193 (1999); expressly incorporated by reference)can be used to create a protein sequence library which can optionallythen be used to generate a smaller secondary library for use inexperimental screening for improved properties and function. Inaddition, there are computational methods based on forcefieldcalculations such as SCMF that can be used as well for SCMF, see Delarueet al. Pac. Symp. Biocomput. 109-21 (1997); Koehl et al., J. Mol. Biol.239:249-75 (1994); Koehl et al., Nat. Struct. Biol. 2:163-70 (1995);Koehl et al., Curr. Opin. Struct. Biol. 6:222-6 (1996); Koehl et al., J.Mol. Biol. 293:1183-93 (1999); Koehl et al., J. Mol. Biol. 293:1161-81(1999); Lee J., Mol. Biol. 236:918-39 (1994); and Vasquez Biopolymers36:53-70 (1995); all of which are expressly incorporated by reference.Other forcefield calculations that can be used to optimize theconformation of a sequence within a computational method, or to generatede novo optimized sequences as outlined herein include, but are notlimited to, OPLS-AA [Jorgensen et al., J. Am. Chem. Soc. 118:11225-11236(1996); Jorgensen, W. L.; BOSS, Version 4.1; Yale University: New Haven,Conn. (1999)]; OPLS [Jorgensen et al., J. Am. Chem. Soc. 110:1657ff(1988); Jorgensen et al., J Am. Chem. Soc. 1 12:4768ff (1990)]; UNRES(United Residue Forcefield; Liwo et al., Protein Science 2:1697-1714(1993); Liwo et al., Protein Science 2:1715-1731 (1993); Liwo et al., J.Comp. Chem. 18:849-873 (1997); Liwo et al., J. Comp. Chem. 18:874-884(1997); Liwo et al., J. Comp. Chem. 19:259-276 (1998); Forcefield forProtein Structure Prediction (Liwo et al., Proc. Natl. Acad. Sci. U.S.A.96:5482-5485 (1999)]; ECEPP/3 [Liwo et al., J Protein Chem. 13(4):375-80(1994)]; AMBER 1.1 force field (Weiner et al., J. Am. Chem. Soc.106:765-784); AMBER 3.0 force field [U.C. Singh et al., Proc. Natl.Acad. Sci. U.S.A.. 82:755-759 (1985)]; CHARMM and CHARMM22 (Brooks etal., J. Comp. Chem. 4:187-217); cvff3.0 [Dauber-Osguthorpe et al.,Proteins: Structure, Function and Genetics, 4:31-47 (1988)]; cff91(Maple et al., J. Comp. Chem. 15:162-182); also, the DISCOVER (cvff andcff91) and AMBER forcefields are used in the INSIGHT molecular modelingpackage (Biosym/MSI, San Diego Calif.) and HARMM is used in the QUANTAmolecular modeling package (Biosym/MSI, San Diego Calif.), all of whichare expressly incorporated by reference. In fact, as is outlined below,these forcefield methods may be used to generate the secondary librarydirectly; that is, no primary library is generated; rather, thesemethods can be used to generate a probability table from which thesecondary library is directly generated.

In a preferred embodiment, the computational method used to generate theprimary library is Protein Design Automation (PDA), as is described inU.S. Ser. Nos. 60/061,097, 60/043,464, 60/054,678, 09/127,926,60/104,612, 60/158,700, 09/419,351, 60/181,630, 60/186,904, U.S. patentapplication, entitled Protein Design Automation For Protein Libraries(Filed: Apr. 14, 2000; Inventor. Bassil Dahiyat), and PCT US98/07254,all of which are expressly incorporated herein by reference. Briefly,PDA can be described as follows. A known protein structure is used asthe starting point. The residues to be optimized are then identified,which may be the entire sequence or subset(s) thereof. The side chainsof any positions to be varied are then removed. The resulting structureconsisting of the protein backbone and the remaining sidechains iscalled the template. Each variable residue position is then preferablyclassified as a core residue, a surface residue, or a boundary residue;each classification defines a subset of possible amino acid residues forthe position (for example, core residues generally will be selected fromthe set of hydrophobic residues, surface residues generally will beselected from the hydrophilic residues, and boundary residues may beeither). Each amino acid can be represented by a discrete set of allallowed conformers of each side chain, called rotamers. Thus, to arriveat an optimal sequence for a backbone, all possible sequences ofrotamers must be screened, where each backbone position can be occupiedeither by each amino acid in all its possible rotameric states, or asubset of amino acids, and thus a subset of rotamers.

Two sets of interactions are then calculated for each rotamer at everyposition: the interaction of the rotamer side chain with all or part ofthe backbone (the “singles” energy, also called the rotamer/template orrotamer/backbone energy), and the interaction of the rotamer side chainwith all other possible rotamers at every other position or a subset ofthe other positions (the “doubles” energy, also called therotamer/rotamer energy). The energy of each of these interactions iscalculated through the use of a variety of scoring functions, whichinclude the energy of van der Waal's forces, the energy of hydrogenbonding, the energy of secondary structure propensity, the energy ofsurface area solvation and the electrostatics. Thus, the total energy ofeach rotamer interaction, both with the backbone and other rotamers, iscalculated, and stored in a matrix form.

The discrete nature of rotamer sets allows a simple calculation of thenumber of rotamer sequences to be tested. A backbone of length n with mpossible rotamers per position will have m^(n) possible rotamersequences, a number which grows exponentially with sequence length andrenders the calculations either unwieldy or impossible in real time.Accordingly, to solve this combinatorial search problem, a “Dead EndElimination” (DEE) calculation is performed. The DEE calculation isbased on the fact that if the worst total interaction of a first rotameris still better than the best total interaction of a second rotamer,then the second rotamer cannot be part of the global optimum solution.Since the energies of all rotamers have already been calculated, the DEEapproach only requires sums over the sequence length to test andeliminate rotamers, which speeds up the calculations considerably. DEEcan be rerun comparing pairs of rotamers, or combinations of rotamers,which will eventually result in the determination of a single sequencewhich represents the global optimum energy.

Once the global solution has been found, a Monte Carlo search may bedone to generate a rank-ordered list of sequences in the neighborhood ofthe DEE solution. Starting at the DEE solution, random positions arechanged to other rotamers, and the new sequence energy is calculated. Ifthe new sequence meets the criteria for acceptance, it is used as astarting point for another jump. After a predetermined number of jumps,a rank-ordered list of sequences is generated. Monte Carlo searching isa sampling technique to explore sequence space around the global minimumor to find new local minima distant in sequence space. As is moreadditionally outlined below, there are other sampling techniques thatcan be used, including Boltzman sampling, genetic algorithm techniquesand simulated annealing. In addition, for all the sampling techniques,the kinds of jumps allowed can be altered (e.g. random jumps to randomresidues, biased jumps (to or away from wild-type, for example), jumpsto biased residues (to or away from similar residues, for example),etc.). Similarly, for all the sampling techniques, the acceptancecriteria of whether a sampling jump is accepted can be altered.

As outlined in U.S. Ser. No. 09/127,926, the protein backbone(comprising (for a naturally occurring protein) the nitrogen, thecarbonyl carbon, the α-carbon, and the carbonyl oxygen, along with thedirection of the vector from the α-carbon to the β-carbon) may bealtered prior to the computational analysis, by varying a set ofparameters called supersecondary structure parameters.

Once a protein structure backbone is generated (with alterations, asoutlined above) and input into the computer, explicit hydrogens areadded if not included within the structure (for example, if thestructure was generated by X-ray crystallography, hydrogens must beadded). After hydrogen addition, energy minimization of the structure isrun, to relax the hydrogens as well as the other atoms, bond angles andbond lengths. In a preferred embodiment, this is done by doing a numberof steps of conjugate gradient minimization [Mayo et al., J. Phys. Chem.94:8897 (1990)] of atomic coordinate positions to minimize the Dreidingforce field with no electrostatics. Generally from about 10 to about 250steps is preferred, with about 50 being most preferred.

The protein backbone structure contains at least one variable residueposition. As is known in the art, the residues, or amino acids, ofproteins are generally sequentially numbered starting with theN-terminus of the protein. Thus a protein having a methionine at it'sN-terminus is said to have a methionine at residue or amino acidposition 1, with the next residues as 2, 3, 4, etc. At each position,the wild type (i.e. naturally occuring) protein may have one of at least20 amino acids, in any number of rotamers. By “variable residueposition” herein is meant an amino acid position of the protein to bedesigned that is not fixed in the design method as a specific residue orrotamer, generally the wild-type residue or rotamer.

In a preferred embodiment, all of the residue positions of the proteinare variable. That is, every amino acid side chain may be altered in themethods of the present invention. This is particularly desirable forsmaller proteins, although the present methods allow the design oflarger proteins as well. While there is no theoretical limit to thelength of the protein which may be designed this way, there is apractical computational limit.

In an alternate preferred embodiment, only some of the residue positionsof the protein are variable, and the remainder are “fixed”, that is,they are identified in the three dimensional structure as being in a setconformation. In some embodiments, a fixed position is left in itsoriginal conformation (which may or may not correlate to a specificrotamer of the rotamer library being used). Alternatively, residues maybe fixed as a non-wild type residue; for example, when knownsite-directed mutagenesis techniques have shown that a particularresidue is desirable (for example, to eliminate a proteolytic site oralter the substrate specificity of an enzyme), the residue may be fixedas a particular amino acid. Alternatively, the methods of the presentinvention may be used to evaluate mutations de novo, as is discussedbelow. In an alternate preferred embodiment, a fixed position may be“floated”; the amino acid at that position is fixed, but differentrotamers of that amino acid are tested. In this embodiment, the variableresidues may be at least one, or anywhere from 0.1% to 99.9% of thetotal number of residues. Thus, for example, it may be possible tochange only a few (or one) residues, or most of the residues, with allpossibilities in between.

In a preferred embodiment, residues which can be fixed include, but arenot limited to, structurally or biologically functional residues;alternatively, biologically functional residues may specifically not befixed. For example, residues which are known to be important forbiological activity, such as the residues which the binding site for abinding partner (ligand/receptor, antigen/antibody, etc.),phosphorylation or glycosylation sites which are crucial to biologicalfunction, or structurally important residues, such as disulfide bridges,metal binding sites, critical hydrogen bonding residues, residuescritical for backbone conformation such as proline or glycine, residuescritical for packing interactions, etc. may all be fixed in aconformation or as a single rotamer, or “floated”.

Similarly, residues which may be chosen as variable residues may bethose that confer undesirable biological attributes, such assusceptibility to proteolytic degradation, dimerization or aggregationsites, glycosylation sites which may lead to immune responses, unwantedbinding activity, unwanted allostery, undesirable enzyme activity butwith a preservation of binding, etc.

In a preferred embodiment, each variable position is classified aseither a core, surface or boundary residue position, although in somecases, as explained below, the variable position may be set to glycineto minimize backbone strain. In addition, as outlined herein, residuesneed not be classified, they can be chosen as variable and any set ofamino acids may be used. Any combination of core, surface and boundarypositions can be utilized: core, surface and boundary residues; core andsurface residues; core and boundary residues, and surface and boundaryresidues, as well as core residues alone, surface residues alone, orboundary residues alone.

The classification of residue positions as core, surface or boundary maybe done in several ways, as will be appreciated by those in the art. Ina preferred embodiment, the classification is done via a visual scan ofthe original protein backbone structure, including the side chains, andassigning a classification based on a subjective evaluation of oneskilled in the art of protein modelling. Alternatively, a preferredembodiment utilizes an assessment of the orientation of the Cα-Cβvectors relative to a solvent accessible surface computed using only thetemplate Cα atoms, as outlined in U.S. Ser. Nos. 60/061,097, 60/043,464,60/054,678, 09/127,926 60/104,612, 60/158,700, 09/419,351, 60/181,630,60/186,904, U.S. patent application, entitled Protein Design AutomationFor Protein Libranes (Filed: Apr. 14, 2000; Inventor: Bassil Dahiyat)and PCT US98/07254. Alternatively, a surface area calculation can bedone.

Suitable core and boundary positions for GHA proteins are outlinedbelow.

Once each variable position is classified as either core, surface orboundary, a set of amino acid side chains, and thus a set of rotamers,is assigned to each position. That is, the set of possible amino acidside chains that the program will allow to be considered at anyparticular position is chosen. Subsequently, once the possible aminoacid side chains are chosen, the set of rotamers that will be evaluatedat a particular position can be determined. Thus, a core residue willgenerally be selected from the group of hydrophobic residues consistingof alanine, valine, isoleucine, leucine, phenylalanine, tyrosine,tryptophan, and methionine (in some embodiments, when the α scalingfactor of the van der Waals scoring function, described below, is low,methionine is removed from the set), and the rotamer set for each coreposition potentially includes rotamers for these eight amino acid sidechains (all the rotamers if a backbone independent library is used, andsubsets if a rotamer dependent backbone is used). Similarly, surfacepositions are generally selected from the group of hydrophilic residuesconsisting of alanine, serine, threonine, aspartic acid, asparagine,glutamine, glutamic acid, arginine, lysine and histidine. The rotamerset for each surface position thus includes rotamers for these tenresidues. Finally, boundary positions are generally chosen from alanine,serine, threonine, aspartic acid, asparagine, glutamine, glutamic acid,arginine, lysine histidine, valine, isoleucine, leucine, phenylalanine,tyrosine, tryptophan, and methionine. The rotamer set for each boundaryposition thus potentially includes every rotamer for these seventeenresidues (assuming cysteine, glycine and proline are not used, althoughthey can be). Additionally, in some preferred embodiments, a set of 18naturally occuring amino acids (all except cysteine and proline, whichare known to be particularly disruptive) are used.

Thus, as will be appreciated by those in the art, there is acomputational benefit to classifying the residue positions, as itdecreases the number of calculations. It should also be noted that theremay be situations where the sets of core, boundary and surface residuesare altered from those described above; for example, under somecircumstances, one or more amino acids is either added or subtractedfrom the set of allowed amino acids. For example, some proteins whichdimerize or multimerize, or have ligand binding sites, may containhydrophobic surface residues, etc. In addition, residues that do notallow helix “capping” or the favorable interaction with an α-helixdipole may be subtracted from a set of allowed residues. Thismodification of amino acid groups is done on a residue by residue basis.

In a preferred embodiment, proline, cysteine and glycine are notincluded in the list of possible amino acid side chains, and thus therotamers for these side chains are not used. However, in a preferredembodiment, when the variable residue position has a φ angle (that is,the dihedral angle defined by 1) the carbonyl carbon of the precedingamino acid; 2) the nitrogen atom of the current residue; 3) the α-carbonof the current residue; and 4) the carbonyl carbon of the currentresidue) greater than 0□, the position is set to glycine to minimizebackbone strain.

Once the group of potential rolamers is assigned for each variableresidue position, processing proceeds as outlined in U.S. Ser. No.09/127,926 and PCT US98/07254. This processing step entails analyzinginteractions of the rotamers with each other and with the proteinbackbone to generate optimized protein sequences. Simplistically, theprocessing initially comprises the use of a number of scoring functionsto calculate energies of interactions of the rotamers, either to thebackbone itself or other rotamers. Preferred PDA scoring functionsinclude, but are not limited to, a Van der Waals potential scoringfunction, a hydrogen bond potential scoring function, an atomicsolvation scoring function, a secondary structure propensity scoringfunction and an electrostatic scoring function. As is further describedbelow, at least one scoring function is used to score each position,although the scoring functions may differ depending on the positionclassification or other considerations, like favorable interaction withan α-helix dipole. As outlined below, the total energy which is used inthe calculations is the sum of the energy of each scoring function usedat a particular position, as is generally shown in Equation 1:E _(total) =nE _(vdw) +nE _(as) +nE _(h-bonding) +nE _(ss) +nE_(elec)  Equation 1

In Equation 1, the total energy is the sum of the energy of the van derWaals potential (E_(vdw)), the energy of atomic solvation (E_(as)), theenergy of hydrogen bonding (E_(h-bonding)), the energy of secondarystructure (E_(ss)) and the energy of electrostatic interaction(E_(elec)). The term n is either 0 or 1, depending on whether the termis to be considered for the particular residue position.

As outlined in U.S. Ser. Nos. 60/061,097, 60/043,464, 60/054,678,09/127,926, 60/104,612, 60/158,700, 09/419,351, 60/181,630, 60/186,904,U.S. patent application, entitled Protein Design Automation For ProteinLibraries (Filed: Apr. 14, 2000; Inventor Bassil Dahiyat), and PCTUS98/07254, any combination of these scoring functions, either alone orin combination, may be used. Once the scoring functions to be used areidentified for each variable position, the preferred first step in thecomputational analysis comprises the determination of the interaction ofeach possible rotamer with all or part of the remainder of the protein.That is, the energy of interaction, as measured by one or more of thescoring functions, of each possible rotamer at each variable residueposition with either the backbone or other rotamers, is calculated. In apreferred embodiment, the interaction of each rotamer with the entireremainder of the protein, i.e. both the entire template and all otherrotamers, is done. However, as outlined above, it is possible to onlymodel a portion of a protein, for example a domain of a larger protein,and thus in some cases, not all of the protein need be considered. Theterm “portion”, or similar grammatical equivalents thereof, as usedherein, with regard to a protein refers to a fragment of that protein.This fragment may range in size from 5-10 amino acid residues to theentire amino acid sequence minus one amino acid. Accordingly, the term“portion”, as used herein, with regard to a nucleic refers to a fragmentof that nucleic acid. This fragment may range in size from 6-10nucleotides to the entire nucleic acid sequence minus one nucleotide.

In a preferred embodiment, the first step of the computationalprocessing is done by calculating two sets of interactions for eachrotamer at every position: the interaction of the rotamer side chainwith the template or backbone (the “singles” energy), and theinteraction of the rotamer side chain with all other possible rotamersat every other position (the “doubles” energy), whether that position isvaried or floated. It should be understood that the backbone in thiscase includes both the atoms of the protein structure backbone, as wellas the atoms of any fixed residues, wherein the fixed residues aredefined as a particular conformation of an amino acid.

Thus, “singles” (rotamer/template) energies are calculated for theinteraction of every possible rotamer at every variable residue positionwith the backbone, using some or all of the scoring functions. Thus, forthe hydrogen bonding scoring function, every hydrogen bonding atom ofthe rotamer and every hydrogen bonding atom of the backbone isevaluated, and the E_(HB) is calculated for each possible rotamer atevery variable position. Similarly, for the van der Waals scoringfunction, every atom of the rotamer is compared to every atom of thetemplate (generally excluding the backbone atoms of its own residue),and the E_(vdW) is calculated for each possible rotamer at everyvariable residue position. In addition, generally no van der Waalsenergy is calculated if the atoms are connected by three bonds or less.For the atomic salvation scoring function, the surface of the rotamer ismeasured against the surface of the template, and the E_(as) for eachpossible rotamer at every variable residue position is calculated. Thesecondary structure propensity scoring function is also considered as asingles energy, and thus the total singles energy may contain an E_(ss)term. As will be appreciated by those in the art, many of these energyterms will be close to zero, depending on the physical distance betweenthe rotamer and the template position; that is, the farther apart thetwo moieties, the lower the energy.

For the calculation of “doubles” energy (rotamer/rotamer), theinteraction energy of each possible rotamer is compared with everypossible rotamer at all other variable residue positions.

Thus, “doubles” energies are calculated for the interaction of everypossible rotamer at every variable residue position with every possiblerotamer at every other variable residue position, using some or all ofthe scoring functions. Thus, for the hydrogen bonding scoring function,every hydrogen bonding atom of the first rotamer and every hydrogenbonding atom of every possible second rotamer is evaluated, and theE_(HB) is calculated for each possible rotamer pair for any two variablepositions. Similarly, for the van der Waals scoring function, every atomof the first rotamer is compared to every atom of every possible secondrotamer, and the E_(vdW) is calculated for each possible rotamer pair atevery two variable residue positions. For the atomic solvation scoringfunction, the surface of the first rotamer is measured against thesurface of every possible second rotamer, and the E_(as) for eachpossible rotamer pair at every two variable residue positions iscalculated. The secondary structure propensity scoring function need notbe run as a “doubles” energy, as it is considered as a component of the“singles” energy. As will be appreciated by those in the art, many ofthese double energy terms will be close to zero, depending on thephysical distance between the first rotamer and the second rotamer; thatis the farther apart the two moieties, the lower the energy.

In addition, as will be appreciated by those in the art, a variety offorce fields that can be used in the PDA calculations can be used,including, but not limited to, Dreiding I and Dreiding II [Mayo et al,J. Phys. Chem. 94:8897 (1990)], AMBER [Weiner et al., J. Amer. Chem.Soc. 106:765 (1984) and Weiner et al., J. Comp. Chem. 106:230 (1986)],MM2 [Allinger, J. Chem. Soc. 99:8127 (1977), Liljefors et al., J. Com.Chem. 8:1051 (1987)]; MMP2 [Sprague et al., J. Comp. Chem. 8:581(1987)]; CHARMM [Brooks et al., J. Comp. Chem. 106:187 (1983)]; GROMOS;and MM3 [Allinger et al., J. Amer. Chem. Soc. 111:8551 (1989)], OPLS-AA[Jorgensen et al., J. Am. Chem. Soc. 118:11225-11236 (1996); Jorgensen,W. L.; BOSS, Version 4.1; Yale University: New Haven, Conn. (1999)];OPLS [Jorgensen et al., J. Am. Chem. Soc. 110:1657ff (1988); Jorgensenet al., J Am. Chem. Soc. 112:4768ff (1990)]; UNRES (United ResidueForcefield; Liwo et al., Protein Science 2:1697-1714 (1993); Liwo etal., Protein Science 2:1715-1731 (1993); Liwo et al., J. Comp. Chem.18:849-873 (1997); Liwo et al., J. Comp. Chem. 18:874-884 (1997); Liwoet al., J. Comp. Chem. 19:259-276 (1998); Forcefield for ProteinStructure Prediction (Liwo et al., Proc. Nati. Acad. Sci. U.S.A96:5482-5485 (1999)]; ECEPP/3 [Liwo et al., J Protein Chem. 13(4):375-80(1994)]; AMBER 1.1 force field (Weiner, et al., J. Am. Chem. Soc.106:765-784); AMBER 3.0 force field (U.C. Singh et al., Proc. Natl.Acad. Sci. U.S.A. 82:755-759); CHARMM and CHARMM22 (Brooks et al., J.Comp. Chem. 4:187-217); cvff3.0 [Dauber-Osguthorpe, et al., Proteins:Structure, Function and Genetics, 4:31-47 (1988)]; cff91 (Maple, et al.,J. Comp. Chem. 15:162-182); also, the DISCOVER (cvff and cff91) andAMBER forcefields are used in the INSIGHT molecular modeling package(Biosym/MSI, San Diego Calif.) and HARMM is used in the QUANTA molecularmodeling package (Biosym/MSI, San Diego Calif.), all of which areexpressly incorporated by reference.

Once the singles and doubles energies are calculated and stored, thenext step of the computational processing may occur. As outlined in U.S.Ser. No. 09/127, 926 and PCT US98/07254, preferred embodiments utilize aDead End Elimination (DEE) step, and preferably a Monte Carlo step.

PDA, viewed broadly, has three components that may be varied to alterthe output (e.g. the primary library): the scoring functions used in theprocess; the filtering technique, and the sampling technique.

In a preferred embodiment, the scoring functions may be altered. In apreferred embodiment, the scoring functions outlined above may be biasedor weighted in a variety of ways. For example, a bias towards or awayfrom a reference sequence or family of sequences can be done; forexample, a bias towards wild-type or homolog residues may be used.Similarly, the entire protein or a fragment of it may be biased; forexample, the active site may be biased towards wild-type residues, ordomain residues towards a particular desired physical property can bedone. Furthermore, a bias towards or against increased energy can begenerated. Additional scoring function biases include, but are notlimited to applying electrostatic potential gradients or hydrophobicitygradients, adding a substrate or binding partner to the calculation, orbiasing towards a desired charge or hydrophobicity.

In addition, in an alternative embodiment, there are a variety ofadditional scoring functions that may be used. Additional scoringfunctions include, but are not limited to torsional potentials, orresidue pair potentials, or residue entropy potentials. Such additionalscoring functions can be used alone, or as functions for processing thelibrary after it is scored initially. For example, a variety offunctions derived from data on binding of peptides to MHC (MajorHistocompatibility Complex) can be used to rescore a library in order toeliminate proteins containing sequences which can potentially bind toMHC, i.e. potentially immunogenic sequences.

In a preferred embodiment, a variety of filtering techniques can bedone, including, but not limited to, DEE and its related counterparts.Additional filtering techniques include, but are not limited tobranch-and-bound techniques for finding optimal sequences (Gordon andMayo, Structure Fold. Des. 7:1089-98, 1999), and exhaustive enumerationof sequences.

As will be appreciated by those in the art, once an optimized sequenceor set of sequences is generated, a variety of sequence space samplingmethods can be done, either in addition to the preferred Monte Carlomethods, or instead of a Monte Carlo search. That is, once a sequence orset of sequences is generated, preferred methods utilize samplingtechniques to allow the generation of additional, related sequences fortesting.

These sampling methods can include the use of amino acid substitutions,insertions or deletions, or recombinations of one or more sequences. Asoutlined herein, a preferred embodiment utilizes a Monte Carlo search,which is a series of biased, systematic, or random jumps. However, thereare other sampling techniques that can be used, including Boltzmansampling, genetic algorithm techniques and simulated annealing. Inaddition, for all the sampling techniques, the kinds of jumps allowedcan be altered (e.g. random jumps to random residues, biased jumps (toor away from wild-type, for example), jumps to biased residues (to oraway from similar residues, for example, etc.). Jumps where multipleresidue positions are coupled (two residues always change together, ornever change together), jumps where whole sets of residues change toother sequences (e.g., recombination). Similarly, for all the samplingtechniques, the acceptance criteria of whether a sampling jump isaccepted can be altered.

In addition, it should be noted that the preferred methods of theinvention result in a rank ordered list of sequences; that is, thesequences are ranked on the basis of some objective criteria. However,as outlined herein, it is possible to create a set of non-orderedsequences, for example by generating a probability table directly (forexample using SCMF analysis or sequence alignment techniques) that listssequences without ranking them. The sampling techniques outlined hereincan be used in either situation.

In a preferred embodiment, Boltzman sampling is done. As will beappreciated by those in the art, the temperature criteria for Boltzmansampling can be altered to allow broad searches at high temperature andnarrow searches close to local optima at low temperatures (see e.g.,Metropolis et al., J. Chem. Phys. 21:1087, 1953).

In a preferred embodiment, the sampling technique utilizes geneticalgorithms, e.g., such as those described by Holland (Adaptation inNatural and Artificial Systems, 1975, Ann Arbor, U. Michigan Press).Genetic algorithm analysis generally takes generated sequences andrecombines them computationally, similar to a nucleic acid recombinationevent, in a manner similar to “gene shuffling”. Thus the “jumps” ofgenetic algorithm analysis generally are multiple position jumps. Inaddition, as outlined below, correlated multiple jumps may also be done.Such jumps can occur with different crossover positions and more thanone recombination at a time, and can involve recombination of two ormore sequences. Furthermore, deletions or insertions (random or biased)can be done. In addition, as outlined below, genetic algorithm analysismay also be used after the secondary library has been generated.

In a preferred embodiment, the sampling technique utilizes simulatedannealing, e.g., such as described by Kirkpatrick et al. [Science,220:671-680 (1983)]. Simulated annealing alters the cutoff for acceptinggood or bad jumps by altering the temperature. That is, the stringencyof the cutoff is altered by altering the temperature. This allows broadsearches at high temperature to new areas of sequence space, alteringwith narrow searches at low temperature to explore regions in detail.

In addition, as outlined below, these sampling methods can be used tofurther process a first set to generate additional sets of GHA proteins.

The computational processing results in a set of optimized GHA proteinsequences. These optimized GHA protein sequences are generallysignificantly different from the wild-type GH sequence from which thebackbone was taken. That is, each optimized GHA protein sequencepreferably comprises at least about 3-10% variant amino acids from thestarting or wild type sequence, with at least about 10-15% beingpreferred, with at least about 15-20% changes being more preferred andat least 25% being particularly preferred.

In a preferred embodiment, the GHA proteins of the invention have 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or40 different residues from the hGH sequence.

Thus, in the broadest sense, the present invention is directed to GHAproteins that have GH activity. By “GH activity” or “GHA” herein ismeant that the protein exhibits at least one, and preferably more, ofthe biological functions of a growth hormone, as defined below. In oneembodiment, the biological function of a GHA protein is altered,preferably improved, over the corresponding biological activity of a GH.

By “protein” herein is meant at least two covalently attached aminoacids, which includes proteins, polypeptides, oligopeptides andpeptides. The protein may be made up of naturally occurring amino acidsand peptide bonds, or synthetic peptidomimetic structures, i.e.,“analogs” such as peptoids [see Simon et al., Proc. Natl. Acd. Sci.U.S.A. 89(20:9367-71 (1992)], generally depending on the method ofsynthesis. Thus “amino acid”, or “peptide residue”, as used herein meansboth naturally occurring and synthetic amino acids. For example,homo-phenylalanine, citrulline, and noreleucine are considered aminoacids for the purposes of the invention. “Amino acid” also includesimino acid residues such as proline and hydroxyproline. In addition, anyamino acid representing a component of the GHA proteins can be replacedby the same amino acid but of the opposite chirality. Thus, any aminoacid naturally occurring in the L-configuration (which may also bereferred to as the R or S, depending upon the structure of the chemicalentity) may be replaced with an amino acid of the same chemicalstructural type, but of the opposite chirality, generally referred to asthe D-amino acid but which can additionally be referred to as the R- orthe S-, depending upon its composition and chemical configuration. Suchderivatives have the property of greatly increased stability, andtherefore are advantageous in the formulation of compounds which mayhave longer in vivo half lives, when administered by oral, intravenous,intramuscular, intraperitoneal, topical, rectal, intraocular, or otherroutes. In the preferred embodiment, the amino acids are in the (S) orL-configuration. If non-naturally occurring side chains are used,non-amino acid substituents may be used, for example to prevent orretard in vivo degradations. Proteins including non-naturally occurringamino acids may be synthesized or in some cases, made recombinantly; seevan Hest et al., FEBS Lett 428:(1-2) 68-70 May 22 1998 and Tang et al.,Abstr. Pap Am. Chem. S218:U138—U138 Part 2 Aug. 22, 1999, both of whichare expressly incorporated by reference herein.

Additionally, modified amino acids or chemical derivatives of aminoacids of consensus or fragments of GHA proteins, according to thepresent invention may be provided, which polypeptides contain additionalchemical moieties or modified amino acids not normally a part of theprotein. Covalent and non-covalent modifications of the protein are thusincluded within the scope of the present invention. Such modificationsmay be introduced into a GHA polypeptide by reacting targeted amino acidresidues of the polypeptide with an organic derivatizing agent that iscapable of reacting with selected side chains or terminal residues. Thefollowing examples of chemical derivatives are provided by way ofillustration and not by way of limitation.

Aromatic amino acids may be replaced with D- or L-naphylalanine, D- orL-Phenylglycine, D- or L-2-thieneylalanine, D- or L-1-, 2-, 3- or4-pyreneylalanine, D- or L-3-thieneylalanine, D- orL-(2-pyridinyl)-alanine, D- or L-(3-pyridinyl)-alanine, D- orL-(2-pyrazinyl)-alanine, D- or L-(4-isopropyl)-phenylglycine,D-(trifluoromethyl)-phenylglycine, D-(trifluoromethyl)-phenylalanine,D-p-fluorophenylalanine, D- or L-p-biphenylphenylalanine, D- orL-p-methoxybiphenylphenylalanine, D- or L-2-indole(alkyl)alanines, andD- or L-alkylainines where alkyl may be substituted or unsubstitutedmethyl, ethyl, propyl, hexyl, butyl, pentyl, isopropyl, iso-butyl,sec-isotyl, iso-pentyl, non-acidic amino acids, of C1-C20.

Acidic amino acids can be substituted with noncarboxylate amino acidswhile maintaining a negative charge, and derivatives or analogs thereof,such as the non-limiting examples of (phosphono)alanine, glycine,leucine, isoleucine, threonine, or serine; or sulfated (e.g., —SO.sub.3H) threonine, serine, tyrosine.

Other substitutions may include unnatural hyroxylated amino acids thatmay be made by combining “alkyl” with any natural amino acid. The term“alkyl” as used herein refers to a branched or unbranched saturatedhydrocarbon group of 1 to 24 carbon atoms, such as methyl, ethyl,n-propyl, isoptopyl, n-butyl, isobutyl, t-butyl, octyl, decyl,tetradecyl, hexadecyl, eicosyl, tetracisyl and the like. Preferred alkylgroups herein contain 1 to 12 carbon atoms. Also included within thedefinition of an alkyl group are cycloalkyl groups such as C5 and C6rings, and heterocyclic rings with nitrogen, oxygen, sulfur orphosphorus. Alkyl also includes heteroalkyl, with heteroatoms of sulfur,oxygen, and nitrogen being preferred. Alkyl includes substituted alkylgroups. By “substituted alkyl group” herein is meant an alkyl groupfurther comprising one or more substitution moieties. A preferredheteroalkyl group is an alkyl amine. By “alkyl amine” or grammaticalequivalents herein is meant an alkyl group as defined above, substitutedwith an amine group at any position. In addition, the alkyl amine mayhave other substitution groups, as outlined above for alkyl group. Theamine may be primary (—NH₂R), secondary (—NHR₂), or tertiary (—NR₃).Basic amino acids may be substituted with alkyl groups at any positionof the naturally occurring amino acids lysine, arginine, ornithine,citrulline, or (guanidino)-acetic acid, or other (guanidino)alkyl-aceticacids, where “alkyl” is define as above. Nitrile derivatives (e.g.,containing the CN-moiety in place of COOH) may also be substituted forasparagine or glutamine, and methionine sulfoxide may be substituted formethionine. Methods of preparation of such peptide derivatives are wellknown to one skilled in the art.

In addition, any amide linkage in any of the IbA polypeptides can bereplaced by a ketomethylene moiety. Such derivatives are expected tohave the property of increased stability to degradation by enzymes, andtherefore possess advantages for the formulation of compounds which mayhave increased in vivo half lives, as administered by oral, intravenous,intramuscular, intraperitoneal, topical, rectal, intraocular, or otherroutes.

Additional amino acid modifications of amino acids of GHA polypeptidesof to the present invention may include the following: Cysteinylresidues may be reacted with alpha-haloacetates (and correspondingamines), such as 2-chloroacetic acid or chloroacetamide, to givecarboxymethyl or carboxyamidomethyl derivatives. Cysteinyl residues mayalso be derivatized by reaction with compounds such asbromotrifluoroacetone, alpha-bromo-beta-(5-imidozoyl)propionic acid,chloroacetyl phosphate, N-alkylmaleimides, 3-nitro-2-pyridyl disulfide,methyl 2-pyridyl disulfide, p-chloromercuribenzoate,2-chloromercuri-4-nitrophenol, or chloro-7-nitrobenzo-2-oxa-1,3-diazole.

Histidyl residues may be derivatized by reaction with compounds such asdiethylprocarbonate e.g., at pH 5.5-7.0 because this agent is relativelyspecific for the histidyl side chain, and para-bromophenacyl bromide mayalso be used; e.g., where the reaction is preferably performed in 0.1Msodium cacodylate at pH 6.0.

Lysinyl and amino terminal residues may be reacted with compounds suchas succinic or other carboxylic acid anhydrides. Derivatization withthese agents is expected to have the effect of reversing the charge ofthe lysinyl residues. Other suitable reagents for derivatizingalpha-amino-containing residues include compounds such asimidoesters/e.g., as methyl picolinimidate; pyridoxal phosphate;pyridoxal; chloroborohydride; trinitrobenzenesulfonic acid;O-methylisourea; 2,4 pentanedione; and transaminase-catalyzed reactionwith glyoxylate.

Arginyl residues may be modified by reaction with one or severalconventional reagents, among them phenylglyoxal, 2,3-butanedione,1,2-cyclohexanedione, and ninhydrin according to known method steps.Derivatization of arginine residues requires that the reaction beperformed in alkaline conditions because of the high pKa of theguanidine functional group. Furthermore, these reagents may react withthe groups of lysine as well as the arginine epsilon-amino group.

The specific modification of tyrosyl residues per se is well-known, suchas for introducing spectral labels into tyrosyl residues by reactionwith aromatic diazonium compounds or tetranitromethane. N-acetylimidizoland tetranitromethane may be used to form O-acetyl tyrosyl species and3-nitro derivatives, respectively.

Carboxyl side groups (aspartyl or glutamyl) may be selectively modifiedby reaction with carbodiimides (R′-N-C-N-R′) such as1-cyclohexyl-3-(2-morpholinyl- (4-ethyl) carbodiimide or1-ethyl-3-(4-azonia-4,4- dimethylpentyl) carbodiimide. Furthermoreaspartyl and glutamyl residues may be converted to asparaginyl andglutaminyl residues by reaction with ammonium ions.

Glutaminyl and asparaginyl residues may be frequently deamidated to thecorresponding glutamyl and aspartyl residues. Alternatively, theseresidues may be deamidated under mildly acidic conditions. Either formof these residues falls within the scope of the present invention.

The GH may be from any number of organisms, with GHs from mammals beingparticularly preferred. Suitable mammals include, but are not limitedto, rodents (rats, mice, hamsters, guinea pigs, etc.), primates, farmanimals (including sheep, goats, pigs, cows, horses, etc) and in themost preferred embodiment, from humans (this is sometimes referred toherein as hGH, the sequence of which is depicted in FIG. 1A (SEQ IDNO:1)). As will be appreciated by those in the art, GHs based on GHsfrom mammals other than humans may find use in animal models of humandisease. The GenBank numbers for a variety of mammalian HG species is asfollows: bovine, RIBOS1, STBO, CAA00787, JC1316; dog, 146145, S35790;sheep, S33339, STSH; cat, JC4632, P46404; pig, STPG; mouse, STMS,P06880; rat, STRT, P01244; Rhesus macaque, I67411I67410; horse, STHO,P01245; human, P01241, STUV2 (somatropin 2 precursor, splice form 2),STUV (somatotropin 2 precursor), STHU (somatotropin 1 precursor).

The GHA proteins of the invention exhibit at least one biologicalfunction of a GH protein. By “growth hormone” or “GH” herein is meant awild type GH or an allelic variant thereof. Thus, GH refers to all formsof growth hormone that are active in accepted GH assays (for examples ofassays, referenced in U.S. Pat. Nos. 4,658,021, 4,665,160, 5,068,317,5,079,345, 5,424,199, 5,534,617, 5,597,709, 5,612,315, 5,633,352,5,635,604, 5,688,666 and references cited therein).

The GHA proteins of the invention exhibit at least one biologicalfunction of GH. By “biological function” or “biological property” hereinis meant any one of the properties or functions of GH, including, butnot limited to, the ability to bind to a GH receptor; the ability tobind to a prolactin receptor; the ability to induce dimerization of agrowth hormone receptor; the ability to induce dimerization of aprolactin receptor; the ability to bind to a cell comprising a growthhormone receptor, the ability to bind to a cell comprising a prolactinreceptor; the ability to induce celiproliferation; the ability to showefficacy in the treatment of hypochondroplasia or idiopathic shortstructure; the ability to show efficacy in the treatment of Turner'ssyndrome; the ability to show efficacy in the treatment of growth delayin burned children; the ability to show efficacy in GH replacementtherapies in GH deficient adults; the ability to show efficacy in thetreatment of muscle wasting under conditions, including, but not limitedto surgical stress, renal failure, muscular dystrophy, glucocorticoidadministration or HIV infection; the ability to show efficacy in thetreatment of congestive heart failure or in cardiovascular drug therapy;the ability to show efficacy in the treatment of bone diseases orosteoporosis; the ability to show efficacy in the treatment of disordersaffecting puberty or reproduction; the ability to show efficacy in theGH therapy in elderly people; the ability to show efficacy in thetreatment of wound healing, including but not limited to stasis ulcers,decubitus ulcers, or diabetic ulcers; the ability to show efficacy inthe treatment of diffuse gastric bleeding; the ability to show efficacyin. the treatment of disorders relating to general anabolism, including,but not limited to pseudoarthrosis, burn therapy, old age cacheticstates; the ability to show efficacy in the post-surgical (trauma)healing process; the ability to show efficacy in total parenteralnutrition (TPN); the ability to show efficacy in the treatment of breastcancer; the ability to show efficacy in the treatment of Prader-Willisyndrome; the ability to show efficacy in the reconstitution of theimmune system; and the ability to show efficacy in the treatment ofobesity; the ability to show efficacy in the treatment of Russell-Silversyndrome.

In one embodiment GHA proteins will exhibit at least 10% of the receptorbinding or biological activity as the wild type GH. More preferred areGHA proteins that exhibit at least 50%, even more preferred are GHAproteins that exhibit at least 90%, and most preferred are GHA proteinsthat exhibit more than 100% of the receptor binding or biologicalactivity as the wild type GH. Biological assays, including receptorbinding assays are described in U.S. Pat. Nos. 4,658,021, 4,665,160,5,068,317, 5,079,345, 5,424,199, 5,534,617, 5,597,709, 5,612,315,5,633,352, 5,635,604, 5,688,666 and references cited therein, andRowlinson et al. [J. Biol. Chem. 270(28):16833-16839 (1995);Endocrinology 137(1):90-5 (1996)], all of which are expresslyincorporated by reference.

In one embodiment, at least one biological property of the GHA proteinis altered when compared to the same property of HA. As outlined above,the invention provides GHA nucleic acids encoding GHA polypeptides. TheGHA polypeptide preferably has at least one property, which issubstantially different from the same property of the correspondingnaturally occurring HA polypeptide. The property of the GHA polypeptideis the result the PDA analysis of the present invention.

The term “altered property” or grammatical equivalents thereof in thecontext of a polypeptide, as used herein, refer to any characteristic orattribute of a polypeptide that can be selected or detected and comparedto the corresponding property of a naturally occurring protein. Theseproperties include, but are not limited to oxidative stability,substrate specificity, catalytic activity, thermal stability, alkalinestability, pH activity profile, resistance to proteolytic degradation,Km, kcat, Km/kcat ratio, kinetic association (K_(on)) and dissociation(K_(off)) rate, protein folding, inducing an immune response, ability tobind to a ligand, ability to bind to a receptor, ability to be secreted,ability to be displayed on the surface of a cell, ability tooligomerize, ability to signal, ability to stimulate cell proliferation,ability to inhibit cell proliferation, ability to induce apoptosis,ability to be modified by phosphorylation or glycosylation, ability totreat disease.

Unless otherwise specified, a substantial change in any of theabove-listed properties, when comparing the property of an GHApolypeptide to the property of a naturally occurring GH protein ispreferably at least a 20%, more preferably, 50%, more preferably atleast a 2-fold increase or decrease.

A change in oxidative stability is evidenced by at least about 20%, morepreferably at least 50% increase of activity of a GHA protein whenexposed to various oxidizing conditions as compared to that of a GH.Oxidative stability is measured by known procedures.

A change in alkaline stability is evidenced by at least about a 5% orgreater increase or decrease (preferably increase) in the half life ofthe activity of a GHA protein when exposed to increasing or decreasingpH conditions as compared to that of a GH. Generally, alkaline stabilityis measured by known procedures.

A change in thermal stability is evidenced by at least about a 5% orgreater increase or decrease (preferably increase) in the half life ofthe activity of a GHA protein when exposed to a relatively hightemperature and neutral pH as compared to that of GH. Generally, thermalstability is measured by known procedures.

Similarly, GHA proteins, for example are experimentally tested andvalidated in in vivo and in in vitro assays. Suitable assays include,but are not limited to, e.g., examining their binding affinity tonatural occurring or variant receptors and to high affinity agonistsand/or antagonists. In addition to cell-free biochemical affinity tests,quantitative comparison are made comparing kinetic and equilibriumbinding constants for the natural receptor to the naturally occurring GHand to the GHA proteins. The kinetic association rate (K_(on)) anddissociation rate (K_(off)), and the equilibrium binding constants(K_(d)) can be determined using surface plasmon resonance on a BIAcoreinstrument following the standard procedure in the literature [Pearce etal., Biochemistry 38:81-89 (1999)]. Comparing the binding constantbetween a natural receptor and its corresponding naturally occurring GHwith the binding constant of a natural occurring receptor and an GHAprotein are made in order to evaluate the sensitivity and specificity ofthe GHA protein. Preferably, binding affinity of the GHA protein tonatural occurring receptors and variant receptors and agonists increasesrelative to the naturally occurring GH, while antagonist affinitydecreases. GHA proteins with higher affinity to antagonists relative tothe hGH may also be generated by the methods of the invention.

In a preferred embodiment, the biological function of a GHA protein isdefined as the ability of the polypeptide of the invention to bind to acell that comprises a growth hormone receptor or a prolactin receptor orany other receptor, to which the naturally occurring GH binds. GenBankaccession numbers for GH binding receptors (GHR) are available forvarious species, e.g., human, A33991, S04530, P09587, P01242, CAA77872,CAA77877, CAA77876; pig, S12136; rat, I57940, A33505. Genbank accessionnumbers for prolactin receptor are available for several species,including human, A40144, NP_(—)000940, P16471; mouse, I153269, I77525,I77524; rat, A41070, A36116, A29884; bovine, 14597, AAB97748, AAB97747;chicken, JQ1655; Xenopus laevis, BAA90400. Either of these receptors maybe used in binding assays with a GHA protein. However, in someembodiments, GHA proteins may not possess this activity.

In a preferred embodiment, the assay system used to determine GHA is anin vitro system using cells that either express endogenous human growthhormone or human prolactin receptors or cells stably transfected with agene encoding the human growth hormone receptor and/or the humanprolactin receptor. In this system, cell proliferation is measured as afunction of BrdU incorporation, which is incorporated into the nucleicacid of proliferating cells. A decrease above background of at leastabout 10%, with at least about 20% being preferred, with at least about30% being more preferred and at least about 50%, 75% and 90% beingespecially preferred is an indication of GHA.

In a preferred embodiment, the antigenic profile in the host animal ofthe GHA protein is similar, and preferably identical, to the antigenicprofile of the host GH; that is, the GHA protein does not significantlystimulate the host organism (e.g. the patient) to an immune response;that is, any immune response is not clinically relevant and there is noallergic response or neutralization of the protein by an antibody. Thatis, in a preferred embodiment, the GHA protein does not containadditional or different epitopes from the GH. By “epitope” or“determinant” herein is meant a portion of a protein which will generateand/or bind an antibody. Thus, in most instances, no significant amountof antibodies are generated to a GHA protein. In general, this isaccomplished by not significantly altering surface residues, as outlinedbelow nor by adding any amino acid residues on the surface which canbecome glycosylated, as novel glycosylation can result in an immuneresponse.

The GHA proteins and nucleic acids of the invention are distinguishablefrom naturally occurring GHs. By “naturally occurring” or “wild type” orgrammatical equivalents, herein is meant an amino acid sequence or anucleotide sequence that is found in nature and includes allelicvariations; that is, an amino acid sequence or a nucleotide sequencethat usually has not been intentionally modified. Accordingly, by“non-naturally occurring”, “synthetic”, or “recombinant” or grammaticalequivalents thereof, herein is meant an amino acid sequence or anucleotide sequence that is not found in nature; that is, an amino acidsequence or a nucleotide sequence that usually has been intentionallymodified. Representative amino acid and nucleotide sequences of anaturally occurring human growth hormone (hGH) are shown in FIGS. 1A-1C(SEQ ID NO:1, SEQ ID NO:14 and SEQ ID NO:2). It should be noted thatunless otherwise stated, all positional numbering of GHA proteins andGHA nucleic acids is based on these sequences. That is, as will beappreciated by those in the art, an alignment of GH proteins and GHAproteins can be done using standard programs, as is outlined below, withthe identification of “equivalent” positions between the two proteins.Thus, the GHA proteins and nucleic acids of the invention arenon-naturally occurring; that is, they do not exist in nature.

Thus, in a preferred embodiment, the GHA protein has an amino acidsequence that differs from a wild-type GH sequence by at least 3% of theresidues. That is, the GHA proteins of the invention are less than about97% identical to a GH amino acid sequence. Accordingly, a protein is an“GHA protein” if the overall homology of the protein sequence to theamino acid sequence shown in FIG. 1A (SEQ ID NO:1) or FIG. 1B (SEQ IDNO:14) is preferably less than about 97%, more preferably less thanabout 95%, even more preferably less than about 90% and most preferablyless than 85%. In some embodiments the homology will be as low as about75 to 80%. Stated differently, based on the sequence of the secreted,mature form of hGH, comprising 191 residues (residues 27 to 217 in FIG.1A (SEQ ID NO:1)) of FIG. 1B (SEQ ID NO:14), GHA proteins have at leastabout 5-6 residues that differ from the hGH sequence (3%), with GHAproteins having from 5 residues to upwards of 40 residues beingdifferent from the hGH sequence. Preferred GHA proteins have 5-40different residues with from about 5 to about 25 being preferred (thatis, 3-13% of the protein is not identical to hGH), with from about 10 to25 being particularly preferred (that is 6-13% of the protein is notidentical to hGH).

In a preferred embodiment, the GHA proteins of the invention have 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or40 different residues from the hGH sequence.

Homology in this context means sequence similarity or identity, withidentity being preferred. As is known in the art, a number of differentprograms can be used to identify whether a protein (or nucleic acid asdiscussed below) has sequence identity or similarity to a knownsequence. Sequence identity and/or similarity is determined usingstandard techniques known in the art, including, but not limited to, thelocal sequence identity algorithm of Smith & Waterman, Adv. Appl. Math.,2:482 (1981), by the sequence identity alignment algorithm of Needleman& Wunsch, J. Mol. Biol., 48:443 (1970), by the search for similaritymethod of Pearson & Lipman, Proc. Natl. Acad. Sci. U.S.A., 85:2444(1988), by computerized implementations of these algorithms (GAP,BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package,Genetics Computer Group, 575 Science Drive, Madison, Wis.), the Best Fitsequence program described by Devereux et al., Nucl. Acid Res.,12:387-395 (1984), preferably using the default settings, or byinspection. Preferably, percent identity is calculated by FastDB basedupon the following parameters: mismatch penalty of 1; gap penalty of 1;gap size penalty of 0.33; and joining penalty of 30, “Current Methods inSequence Comparison and Analysis, ” Macromolecule Sequencing andSynthesis, Selected Methods and Applications, pp 127-149 (1988), Alan R.Liss, Inc.

An example of a useful algorithm is PILEUP. PILEUP creates a multiplesequence alignment from a group of related sequences using progressive,pairwise alignments. It can also plot a tree showing the clusteringrelationships used to create the alignment. PILEUP uses a simplificationof the progressive alignment method of Feng & Doolittle, J. Mol. Evol.35:351-360 (1987); the method is similar to that described by Higgins &Sharp CABIOS 5:151-153 (1989). Useful PILEUP parameters including adefault gap weight of 3.00, a default gap length weight of 0.10, andweighted end gaps.

Another example of a useful algorithm is the BLAST algorithm, describedin: Altschul et al., J. Mol. Biol. 215, 403-410, (1990); Altschul etal., Nucleic Acids Res. 25:3389-3402 (1997); and Karlin et al., Proc.Natl. Acad. Sci. U.S.A. 90:5873-5787 (1993). A particularly useful BLASTprogram is the WU-BLAST-2 program which was obtained from Altschul etal., Methods in Enzymology, 266:460-480 (1996);http://blast.wust/edu/blast/ README.html]. WU-BLAST-2 uses severalsearch parameters, most of which are set to the default values. Theadjustable parameters are set with the following values: overlap span=1,overlap fraction=0.125, word threshold (T) =11. The HSP S and HSP S2parameters are dynamic values and are established by the program itselfdepending upon the composition of the particular sequence andcomposition of the particular database against which the sequence ofinterest is being searched; however, the values may be adjusted toincrease sensitivity.

An additional useful algorithm is gapped BLAST as reported by Altschulet al., Nucl. Acids Res., 25:3389-3402. Gapped BLAST uses BLOSUM-62substitution scores; threshold T parameter set to 9; the two-hit methodto trigger ungapped extensions; charges gap lengths of k a cost of 10+k;X_(u) set to 16, and X_(g) set to 40 for database search stage and to 67for the output stage of the algorithms. Gapped alignments are triggeredby a score corresponding to ˜22 bits.

A % amino acid sequence identity value is determined by the number ofmatching identical residues divided by the total number of residues ofthe “longer” sequence in the aligned region. The “longer” sequence isthe one having the most actual residues in the aligned region (gapsintroduced by WU-Blast-2 to maximize the alignment score are ignored).

In a similar manner, “percent (%) nucleic acid sequence identity” withrespect to the coding sequence of the polypeptides identified herein isdefined as the percentage of nucleotide residues in a candidate sequencethat are identical with the nucleotide residues in the coding sequenceof the cell cycle protein. A preferred method utilizes the BLASTN moduleof WU-BLAST-2 set to the default parameters, with overlap span andoverlap fraction set to 1 and 0.125, respectively.

The alignment may include the introduction of gaps in the sequences tobe aligned. In addition, for sequences which contain either more orfewer amino acids than the protein encoded by the sequence of FIG. 1C(SEQ ID NO:2), it is understood that in one embodiment, the percentageof sequence identity will be determined based on the number of identicalamino acids in relation to the total number of amino acids. Thus, forexample, sequence identity of sequences shorter than that shown in FIGS.1A-1C (SEQ ID NO:1, SEQ ID NO:14, SEQ ID NO:2), as discussed below, willbe determined using the number of amino acids in the shorter sequence,in one embodiment. In percent identity calculations relative weight isnot assigned to various manifestations of sequence variation, such as,insertions, deletions, substitutions, etc.

In one embodiment, only identities are scored positively (+1) and allforms of sequence variation including gaps are assigned a value of “0”,which obviates the need for a weighted scale or parameters as describedbelow for sequence similarity calculations. Percent sequence identitycan be calculated, for example, by dividing the number of matchingidentical residues by the total number of residues of the “shorter”sequence in the aligned region and multiplying by 100. The “longer”sequence is the one having the most actual residues in the alignedregion.

Thus, GHA proteins of the present invention may be shorter or longerthan the amino acid sequences shown in FIG. 1A (SEQ ID NO:1). Thus, in apreferred embodiment, included within the definition of GHA proteins areportions or fragments of the sequences depicted herein. Fragments of GHAproteins are considered GHA proteins if a) they share at least oneantigenic epitope; b) have at least the indicated homology; c) andpreferably have GHA biological activity as defined herein.

In a preferred embodiment, as is more fully outlined below, the GHAproteins include further amino acid variations, as compared to a wildtype GH, than those outlined herein. In addition, as outlined herein,any of the variations depicted herein may be combined in any way to formadditional novel GHA proteins.

In addition, GHA proteins can be made that are longer than thosedepicted in the figures, for example, by the addition of epitope orpurification tags, as outlined herein, the addition of other fusionsequences, etc. For example, the GHA proteins of the invention may befused to other therapeutic proteins such as IL-11 or to other proteinssuch as Fc or serum albumin for pharmacokinetic purposes. See forexample U.S. Pat. Nos. 5,766,883 and 5,876,969, both of which areexpressly incorporated by reference.

In a preferred embodiment, the GHA proteins comprise variable residuesin core and boundary residues.

hGH core residues are as follows: positions 6, 10, 13, 17, 20, 24, 27,28, 31, 36, 44, 54, 55, 58, 73, 75, 76, 78, 79, 80, 81, 82, 83, 85, 90,93, 96, 97, 105, 110, 114, 117, 121, 124, 157, 161, 162, 163, 166, 170,173, 176, 177, 180, and 184 (see FIGS. 3A and 4A). Accordingly, in apreferred embodiment, GHA proteins have variable positions selected fromthese positions.

In a preferred embodiment, GHA proteins have variable positions selectedsolely from core residues of hGH. Alternatively, at least a majority(51%) of the variable positions are selected from core residues, with atleast about 75% of the variable positions being preferably selected fromcore residue positions, and at least about 90% of the variable positionsbeing particularly preferred. A specifically preferred embodiment hasonly core variable positions altered as compared to hGH.

Particularly preferred embodiments where GHA proteins have variable corepositions as compared to hGH are shown in FIGS. 4A-4E (SEQ ID NOS:36).

In one embodiment, the variable core positions are altered to any of theother 19 amino acids. In a preferred embodiment, the variable coreresidues are chosen from Ala, Val, Leu, lie, Phe, Tyr, Trp, Met or Ser.In another preferred embodiment, the variable residues are chosen fromAla, Val, Leu, Ile, Phe, Tyr, Trp, Asp, Asn, Glu, Gln, Lys, Ser, Thr,Hsp (a positively charged histidine), Arg, Met, His, or Gly.

In a preferred embodiment, the GHA protein of the invention has asequence that differs from a wild-type hGH in at least one amino acidposition selected from positions 6, 7, 10, 13, 14, 17, 20, 24, 26, 27,28, 29, 30, 31, 32, 34, 35, 36, 40, 43, 44, 50, 54, 55, 56, 57, 58, 59,66, 70, 71. 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 87, 90,92, 93, 96, 97, 98, 100, 102, 104, 105, 106, 107, 109, 110, 111, 113,114, 115, 117, 118, 121, 124, 125, 130, 132, 135, 137, 138, 139, 140,141, 142, 143, 144, 145, 156, 157, 158, 159, 161, 162, 163, 166, 170,173, 176, 177, 180, 183, 184, 185, and 188 (see also FIGS. 3 to 7, whichoutline sets of amino acid positions).

In another preferred embodiment, the GHA protein of the invention has asequence that differs from a wild-type hGH in at least one amino acidposition selected from positions 6, 10, 13, 17, 20, 24, 27, 28, 31, 36,44, 54, 55, 58, 73, 75, 76, 78, 79, 80, 81, 82, 83, 85, 90, 93, 96, 97,105, 110, 114, 117, 121, 124, 157, 161, 162, 163, 166, 170, 173, 176,177, 180, and 184 (see also FIGS. 3A-D and 4A) which outline this set ofamino acid positions).

In one aspect of this embodiment, preferred amino acid changes withinthe CORE region are as follows: 6LM; 6LV; A13V; A13I; A17M; L20M; T27V;Y28F; Y28L; F31W; F31I; F31M; F31L; F31Y; F31V; I36L; I36V; F54Y; S55A;I58V; I58L; L73M; L73A; L75M; L76M; I78F; I78M; I78V; I78L; S79A; S79V;L81I; L82I; L82M; L82V; I83L; I83V; S85A; V901; V90M; F97L; V110M;V110L; V110I; L114M; L114F; L117M; L117A; I121V; I121M; I121L; G161M;G161L; G161F; L162V; L162 I; L162M; L163M; F166M; F166L; M170F; M170L;V173I; F176Y; L177I; and S184A (see FIG. 4A). These may be done eitherindividually or in combination, with any combination possible. However,as outlined herein, preferred embodiments utilize at least five, andpreferably more variable positions in each GHA protein.

In a particularly preferred embodiment, a preferred GHA proteincomprises the following changes: A13V, T27V, Y28F, F54Y, S55A, S79A,S85A, V90I, L114M, G161M, and S184A (see FIG. 4B)(SEQ ID NO:3).

In one particularly preferred embodiment, a preferred GHA proteincomprises the following changes: A13V, T27V, S79A, V90I, G161M, andS184A (see FIG. 4C)(SEQ ID NO:4).

In another particularly preferred embodiment, a preferred GHA proteincomprises the following changes: A13V, T27V, S55A, S79A, S85A, V90I,G161M, and S184A (see FIG. 4D)(SEQ ID NO:5).

In a particularly preferred embodiment, a preferred GHA proteincomprises the following changes: A13V, T27V, Y28F, F54Y, S55A, S79A,S85A, V90I, G161M, and S184A (see FIG. 4E)(SEQ ID NO:6).

In another preferred embodiment, the GHA protein of the invention has asequence that differs from a wild-type hGH in at least one amino acidposition selected from positions 6, 14, 26, 30, 32, 34, 35, 40, 50, 56,57, 59, 66, 71, 74, 84, 92, 107, 109, 113, 118, 125, 130, 139, 143, 157,158, and 183 (see also FIGS. 3 and 5A), which outline this set of aminoacid positions).

In one aspect of this embodiment, preferred amino acid changes withinthe BOUNDARY1 region are as follows: L61; L6E; M14L; M14I; D26R; D26A;D26K; D26L; D26M; E30W; E30F; E30V; A34K; A34H; A34W; Y35E; Y35D; Q40K;Q40V; Q40R; Q40Y; Q40I; Q40L; Q40M; T50F; T50L; E56F; S57K; S57R; S57I;S57H; S57V; S57F; S57Y; S57A; S57L; S57Q; P59E; P59V; S71H; S71E; E74F;E74W; Q84R; Q84I; F92R; F92K; F92H; F92E; F92Y; F92V; F92L; D107A;D107V; D107E; N109F; N109R; N109L; N109I; N109V; N109K; N109M; L113F;E118L; M125I; M125V; D130R; D130V; D130H; D130Hsp; D130A; F139H; F139A;F139H; F139Hsp; F139E; Y143D; Y143A; L157R; L157M; L157V; L157H; K158F;K158E; K158L; K158V; R183H; R183K; R183F; and R183I (see FIG. 5A). Thesemay be done either individually or in combination, with any combinationpossible. However, as outlined herein, preferred embodiments utilize atleast five, and preferably more variable positions in each GHA protein.

In a particularly preferred embodiment, a preferred GHA proteincomprises the following changes: M14L, D26A, E30V, A34W, Q40K, T50F,S57E, P59V, S71H, Q84R, F92E, D107A, N109F, E118L, M125I, D130R, F139H,Y143D, and R183H (see FIG. 5B)(SEQ ID NO:7).

In one particularly preferred embodiment, a preferred GHA proteincomprises the following changes: M14L, D26A, E30W, A34K, Y35E, Q40K,T50F, S57K, P59E, S71H, Q84R, F92R, N109F, E118L, M125I, D130R, F139H,Y143D, K158F, and R183H (see FIG. 5C)(SEQ ID NO:8).

In one preferred embodiment, the GHA protein of the invention has asequence that differs from a wild-type hGH in at least one amino acidposition selected from positions 7, 29, 43, 70, 77, 87, 98, 100, 102,104, 106, 111, 115, 132, 137, 140, 141, 142, 156, 159, 161, 184, 185,and 188 (see also FIGS. 3A-D and 6A), which outline this set of aminoacid positions).

In one aspect of this embodiment, preferred amino acid changes withinthe BOUNDARY2 region are as follows: S7K; S7Y; S7R; S7F; S7L; S7V; Q29K;Q29I; Q29R; Q29V; S43K; S43R; S43W; S43I; S43V; S43H; S43M; S43L; S43F;K70L; K70M; K70R; R77M; R77L; R77V; L87I; L87M; L87V; L87Y; E98V; E98M;E98A; E98K; S100A; V102I; S106K; S106A; S106R; S106H; S106M; S106L;Y111R; Y111K; K115R; K115H; S132A; Q137W; Q137R; Q137A; Q141K; Q141F;T142V; L156M; L156I; L156V; L156W; L156K; L156R; L156T; L156Y; L156A;N159F; N159M; N159I; N159W; G161M; G161L; G161F; S184A; V185I; and S188A(see FIG. 6A). These may be done either individually or in combination,with any combination possible. However, as outlined herein, preferredembodiments utilize at least five, and preferably more variablepositions in each GHA protein.

In a particularly preferred embodiment, a preferred GHA proteincomprises the following changes: S7K, Q29K, S43K, R77M, A98V, S100A,S106K, Y111R, S132A, A137W, A141K, A142V, N159F, G161 M, S184A, andS188A (see FIG. 6B)(SEQ ID NO:9).

In another preferred embodiment, the GHA protein of the invention has asequence that differs from a wild-type hGH in at least one amino acidposition selected from positions 7, 14, 26, 29, 30, 34, 40, 43, 50, 57,70, 77, 84, 87, 92, 98, 100, 102, 104, 106, 109, 111, 115, 118, 125,132, 135, 137, 138, 140, 141, 142, 143, 144, 145, 147, 156, 159, 161,184, 185, and 188 ( )(see also FIGS. 3A-D and 7A), which outline thisset of amino acid positions).

In one aspect of this embodiment, preferred amino acid changes withinthe CLUSTERED BOUNDARY region are as follows: D26K; D26L; D26R; D26M;D26A; Q29I; Q29V; Q29L; Q29K; E30V; E30I; E30K; E30L; E30W; E30R; A34W;A34F; A34L; A34K; Q40V; Q40W; Q40I; Q40R; Q40K; Q40L; Q40Y; Q40M; Q40H;S43W; S43K; S43R; S43F; S43H; S43I; S43M; T50F; T50M; R77M; Q84M; Q84W;Q84V; F92V; F92Y; F92R; F92A; F92K; F92L; S100A; V102I; N1091; N109V;N109F; N109M; N109W; N109L; N109Y; N109A; N109K; N109R; Y111R; Y111K;E118M; E118F; E118K; E118L; M125I; M125V; S132A; Q137R; Q137F; Q137H;Q137Y; Q141F; T142V; T142K; T142Y; T142R; Y143V; Y143A; and K145A (seeFIG. 7A). These may be done either individually or in combination, withany combination possible. However, as outlined herein, preferredembodiments utilize at least five, and preferably more variablepositions in each GHA protein.

In a particularly preferred embodiment, a preferred GHA proteincomprises the following changes: D26K, Q29I; E30V, A34W, Q40V, S43K,T50F, R77M, Q84M, F92V, S100A, V102I; N109F, Y111R, E118M, M125I; S132A,Q137R, Q141F, T142V, Y143V, and K145A (see FIG. 7B)(SEQ ID NO:10).

In one particularly preferred embodiment, a preferred GHA proteincomprises the following changes: D26K, Q29I; E30V, A34W, Q40W, S43W,T50F, R77M, Q84M, F92V, S100A, V102I; N109F, Y111R, E118M, M125I; S132A,T135A, Q137R, I138A, Q141F, T142V, Y143V, S144A, K145A, and D147A (seeFIG. 7C)(SEQ ID NO:11).

In another particularly preferred embodiment, a preferred GHA proteincomprises the following changes: D26K, Q29I; E30V, A34W, Q40V, S43K,T50F, R77M, Q84M, F92V, S100A, V102I; N109F, Y111R, E118M, M125I; S132A,T135A, Q137R, I138A, Q141F, T142V, Y143V, S144A, K145A, and D147A (seeFIG. 7D)(SEQ ID NO:12).

In a particularly preferred embodiment, a preferred GHA proteincomprises the following changes: D26E, Q29K, E30V, A34W, Q40V, S43K,T50F, R77M, Q84M, F92V, S100A, N109F, Y111R, E118K, M125I; S132A, T135A,Q137R, I138A, Q141 F, T142V, Y143V, S144A, K145A and D147A (see FIG.7E)(SEQ ID NO:13).

In a preferred embodiment, the GHA proteins of the invention are hGHconformers. By “conformer” herein is meant a protein that has a proteinbackbone 3D structure that is virtually the same but has significantdifferences in the amino acid side chains. That is, the GHA proteins ofthe invention define a conformer set, wherein all of the proteins of theset share a backbone structure and yet have sequences that differ by atleast 3-5%. The three dimensional backbone structure of a GHA proteinthus substantially corresponds to the three dimensional backbonestructure of hGH. “Backbone” in this context means the non-side chainatoms: the nitrogen, carbonyl carbon and oxygen, and the α-carbon, andthe hydrogens attached to the nitrogen and α-carbon. To be considered aconformer, a protein must have backbone atoms that are no more than 2 Åfrom the hGH structure, with no more than 1.5 Å being preferred, and nomore than 1 Å being particularly preferred. In general, these distancesmay be determined in two ways. In one embodiment, each potentialconformer is crystallized and its three dimensional structuredetermined. Alternatively, as the former is quite tedious, the sequenceof each potential conformer is run in the PDA program to determinewhether it is a conformer.

GHA proteins may also be identified as being encoded by GHA nucleicacids. In the case of the nucleic acid, the overall homology of thenucleic acid sequence is commensurate with amino acid homology but takesinto account the degeneracy in the genetic code and codon bias ofdifferent organisms. Accordingly, the nucleic acid sequence homology maybe either lower or higher than that of the protein sequence, with lowerhomology being preferred.

In a preferred embodiment, a GHA nucleic acid encodes a GHA protein. Aswill be appreciated by those in the art, due to the degeneracy of thegenetic code, an extremely large number of nucleic acids may be made,all of which encode the GHA proteins of the present invention. Thus,having identified a particular amino acid sequence, those skilled in theart could make any number of different nucleic acids, by simplymodifying the sequence of one or more codons in a way which does notchange the amino acid sequence of the GHA.

In one embodiment, the nucleic acid homology is determined throughhybridization studies. Thus, for example, nucleic acids which hybridizeunder high stringency to the nucleic acid sequence shown in FIG. 1C (SEQID NO:2) or its complement and encode a GHA protein is considered a GHAgene.

High stringency conditions are known in the art; see for exampleSambrook et al., Molecular Cloning: A Laboratory Manual, 2d Edition,1989, and Short Protocols in Molecular Biology, ed. Ausubel, et al.,both of which are hereby incorporated by reference. Stringent conditionsare sequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures. Anextensive guide to the hybridization of nucleic acids is found inTijssen, Techniques in Biochemistry and Molecular Biology—Hybridizationwith Nucleic Acid Probes, “Overview of principles of hybridization andthe strategy of nucleic acid assays” (1993). Generally, stringentconditions are selected to be about 5-10° C. lower than the thermalmelting point (T_(m)) for the specific sequence at a defined ionicstrength and pH. The T_(m) is the temperature (under defined ionicstrength, pH and nucleic acid concentration) at which 50% of the probescomplementary to the target hybridize to the target sequence atequilibrium (as the target sequences are present in excess, at T_(m),50% of the probes are occupied at equilibrium). Stringent conditionswill be those in which the salt concentration is less than about 1.0 Msodium ion, typically about 0.01 to 1.0 M sodium ion concentration (orother salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C. for short probes (e.g. 10 to 50 nucleotides) and at least about 60°C. for long probes (e.g. greater than 50 nucleotides). Stringentconditions may also be achieved with the addition of destabilizingagents such as formamide.

In another embodiment, less stringent hybridization conditions are used;for example, moderate or low stringency conditions may be used, as areknown in the art; see Sambrook et al., supra; Ausubel, supra, andTijssen, supra.

The GHA proteins and nucleic acids of the present invention arerecombinant. As used herein, “nucleic add” may refer to either DNA orRNA, or molecules which contain both deoxy- and ribonucleotides. Thenucleic acids include genomic DNA, cDNA and oligonucleotides includingsense and anti-sense nucleic acids. Such nucleic acids may also containmodifications in the ribose-phosphate backbone to increase stability andhalf life of such molecules in physiological environments.

The nucleic acid may be double stranded, single stranded, or containportions of both double stranded or single stranded sequence. As will beappreciated by those in the art, the depiction of a single strand(“Watson”) also defines the sequence of the other strand (“Crick”); thusthe sequence depicted in FIG. 1C (SEQ ID NO:2) also includes thecomplement of the sequence. By the term “recombinant nucleic acid”herein is meant nucleic acid, originally formed in vitro, in general, bythe manipulation of nucleic acid by endonucleases, in a form notnormally found in nature. It is understood that once a recombinantnucleic acid is made and reintroduced into a host cell or organism, itwill replicate non-recombinantly, i.e., using the in vivo cellularmachinery of the host cell rather than in vitro manipulations; however,such nucleic acids, once produced recombinantly, although subsequentlyreplicated non-recombinantly, are still considered recombinant for thepurpose of the invention. Thus an isolated GHA nucleic acid, in a linearform, or an expression vector formed in vitro by ligating DNA moleculesthat are not normally joined, are both considered recombinant for thepurposes of this invention.

Similarly, a “recombinant protein” is a protein made using recombinanttechniques, i.e. through the expression of a recombinant nucleic acid asdepicted above. A recombinant protein is distinguished from naturallyoccurring protein by at least one or more characteristics. For example,the protein may be isolated or purified away from some or all of theproteins and compounds with which it is normally associated in its wildtype host, and thus may be substantially pure. For example, an isolatedprotein is unaccompanied by at least some of the material with which itis normally associated in its natural state, preferably constituting atleast about 0.5%, more preferably at least about 5% by weight of thetotal protein in a given sample. A substantially pure protein comprisesat least about 75% by weight of the total protein, with at least about80% being preferred, and at least about 90% being particularlypreferred. The definition includes the production of a GHA protein fromone organism in a different organism or host cell. Alternatively, theprotein may be made at a significantly higher concentration than isnormally seen, through the use of a inducible promoter or highexpression promoter, such that the protein is made at increasedconcentration levels. Furthermore, all of the GHA proteins outlinedherein are in a form not normally found in nature, as they contain aminoacid substitutions, insertions and deletions, with substitutions beingpreferred, as discussed below.

Also included within the definition of GHA proteins of the presentinvention are amino acid sequence variants of the GHA sequences outlinedherein and shown in the FIGS. That is, the GHA proteins may containadditional variable positions as compared to hGH. These variants fallinto one or more of three classes: substitutional, insertional ordeletional variants. These variants ordinarily are prepared by sitespecific mutagenesis of nucleotides in the DNA encoding a GHA protein,using cassette or PCR mutagenesis or other techniques well known in theart, to produce DNA encoding the variant, and thereafter expressing theDNA in recombinant cell culture as outlined above. However, variant GHAprotein fragments having up to about 100-150 residues may be prepared byin vitro synthesis using established techniques. Amino acid sequencevariants are characterized by the predetermined nature of the variation,a feature that sets them apart from naturally occurring allelic orinterspecies variation of the GHA protein amino acid sequence. Thevariants typically exhibit the same qualitative biological activity asthe naturally occurring analogue, although variants can also be selectedwhich have modified characteristics as will be more fully outlinedbelow.

While the site or region for introducing an amino acid sequencevariation is predetermined, the mutation per se need not bepredetermined. For example, in order to optimize the performance of amutation at a given site, random mutagenesis may be conducted at thetarget codon or region and the expressed GHA variants screened for theoptimal combination of desired activity. Techniques for makingsubstitution mutations at predetermined sites in DNA having a knownsequence are well known, for example, M13 primer mutagenesis and PCRmutagenesis. Screening of the mutants is done using assays of GHAprotein activities.

Amino acid substitutions are typically of single residues; insertionsusually will be on the order of from about 1 to 20 amino acids, althoughconsiderably larger insertions may be tolerated. Deletions range fromabout 1 to about 20 residues, although in some cases deletions may bemuch larger.

Substitutions, deletions, insertions or any combination thereof may beused to arrive at a final derivative. Generally these changes are doneon a few amino acids to minimize the alteration of the molecule.However, larger changes may be tolerated in certain circumstances. Whensmall alterations in the characteristics of the GHA protein are desired,substitutions are generally made in accordance with the following chart:

CHART I Original Residue Exemplary Substitutions Ala Ser Arg Lys AsnGln, His Asp Glu Cys Ser, Ala Gln Asn Glu Asp Gly Pro His Asn, Gln IleLeu, Val Leu Ile, Val Lys Arg, Gln, Glu Met Leu, Ile Phe Met, Leu, TyrSer Thr Thr Ser Trp Tyr Tyr Trp, Phe Val Ile, Leu

Substantial changes in function or immunological identity are made byselecting substitutions that are less conservative than those shown inChart I. For example, substitutions may be made which more significantlyaffect: the structure of the polypeptide backbone in the area of thealteration, for example the alpha-helical or beta-sheet structure; thecharge or hydrophobicity of the molecule at the target site; or the bulkof the side chain. The substitutions which in general are expected toproduce the greatest changes in the polypeptide's properties are thosein which (a) a hydrophilic residue, e.g. seryl or threonyl, issubstituted for (or by) a hydrophobic residue, e.g. leucyl, isoleucyl,phenylalanyl, valyl or alanyl; (b) a cysteine or proline is substitutedfor (or by) any other residue; (c) a residue having an electropositiveside chain, e.g. lysyl, arginyl, or histidyl, is substituted for (or by)an electronegative residue, e.g. glutamyl or aspartyl; or (d) a residuehaving a bulky side chain, e.g. phenylalanine, is substituted for (orby) one not having a side chain, e.g. glycine.

The variants typically exhibit the same qualitative biological activityand will elicit the same immune response as the original GHA protein,although variants also are selected to modify the characteristics of theGHA proteins as needed. Alternatively, the variant may be designed suchthat the biological activity of the GHA protein is altered. For example,glycosylation sites may be altered or removed. Similarly, the biologicalfunction may be altered; for example, in some instances it may bedesirable to have more or less potent hGH activity.

The GHA proteins and nucleic acids of the invention can be made in anumber of ways. Individual nucleic acids and proteins can be made asknown in the art and outlined below. Alternatively, libraries of GHAproteins can be made for testing.

In a preferred embodiment, sets or libraries of GHA proteins aregenerated from a probability distribution table. As outlined herein,there are a variety of methods of generating a probability distributiontable, including using PDA, sequence alignments, forcefield calculationssuch as SCMF calculations, etc. In addition, the probabilitydistribution can be used to generate information entropy scores for eachposition, as a measure of the mutational frequency observed in thelibrary.

In this embodiment, the frequency of each amino acid residue at eachvariable position in the list is identified. Frequencies can bethresholded, wherein any variant frequency lower than a cutoff is set tozero. This cutoff is preferably 1%, 2%, 5%, 10% or 20%, with 10% beingparticularly preferred. These frequencies are then built into the GHAlibrary. That is, as above, these variable positions are collected andall possible combinations are generated, but the amino acid residuesthat “fill” the library are utilized on a frequency basis. Thus, in anon-frequency based library, a variable position that has 5 possibleresidues will have 20% of the proteins comprising that variable positionwith the first possible residue, 20% with the second, etc. However, in afrequency based library, a variable position that has 5 possibleresidues with frequencies of 10%, 15%, 25%, 30% and 20%, respectively,will have 10% of the proteins comprising that variable position with thefirst possible residue, 15% of the proteins with the second residue, 25%with the third, etc. As will be appreciated by those in the art, theactual frequency may depend on the method used to actually generate theproteins; for example, exact frequencies may be possible when theproteins are synthesized. However, when the frequency-based primersystem outlined below is used, the actual frequencies at each positionwill vary, as outlined below.

As will be appreciated by those in the art and outlined herein,probability distribution tables can be generated in a variety of ways.In addition to the methods outlined herein, self-consistent mean field(SCMF) methods can be used in the direct generation of probabilitytables. SCMF is a deterministic computational method that uses a meanfield description of rotamer interactions to calculate energies. Aprobability table generated in this way can be used to create librariesas described herein. SCMF can be used in three ways: the frequencies ofamino acids and rotamers for each amino acid are listed at eachposition; the probabilities are determined directly from SCMF (seeDelarue et la. Pac. Symp. Biocomput. 109-21 (1997), expresslyincorporated by reference). In addition, highly variable positions andnon-variable positions can be identified. Alternatively, another methodis used to determine what sequence is jumped to during a search ofsequence space; SCMF is used to obtain an accurate energy for thatsequence; this energy is then used to rank it and create a rank-orderedlist of sequences (similar to a Monte Carlo sequence list). Aprobability table showing the frequencies of amino acids at eachposition can then be calculated from this list (Koehl et al., J. Mol.Biol. 239:249 (1994); Koehl et al., Nat. Struc. Biol. 2:163 (1995);Koehl et al., Curr. Opin. Struct. Biol. 6:222 (1996); Koehl et al., J.Mol. Bio. 293:1183 (1999); Koehl et al., J. Mol. Biol. 293:1161 (1999);Lee J. Mol. Biol. 236:918 (1994); and Vasquez Biopolymers 36:53-70(1995); all of which are expressly incorporated by reference. Similarmethods include, but are not limited to, OPLS-AA (Jorgensen, et al., J.Am. Chem. Soc. (1996), v 118, pp 11225-11236; Jorgensen, W. L.; BOSS,Version 4.1; Yale University: New Haven, Conn. (1999)); OPLS (Jorgensen,et al., J. Am. Chem. Soc. (1988), v 110, pp 1657ff; Jorgensen, et al., JAm. Chem. Soc. (1990), v 112, pp 4768ff); UNRES (United ResidueForcefield; Liwo, et al., Protein Science (1993), v 2, pp1697-1714;Liwo, et al., Protein Science (1993), v 2, pp1715-1731; Liwo, et al., J.Comp. Chem. (1997), v 18, pp849-873; Liwo, et al., J. Comp. Chem.(1997), v 18, pp874-884; Liwo, et al., J. Comp. Chem. (1998), v 19,pp259-276; Forcefield for Protein Structure Prediction (Liwo, et al.,Proc. Natl. Acad. Sci. USA (1999), v 96, pp5482-5485); ECEPP/3 (Liwo etal., J Protein Chem 1994 May;13(4):375-80); AMBER 1.1 force field(Weiner, et al., J. Am. Chem. Soc. v106, pp765-784); AMBER 3.0 forcefield (U.C. Singh et al., Proc. Natl. Acad. Sci. USA. 82:755-759);CHARMM and CHARMM22 (Brooks, et al., J. Comp. Chem. v4, pp 187-217);cvff3.0 (Dauber-Osguthorpe, et al., (1988) Proteins: Structure, Functionand Genetics, v4, pp31-47); cff91 (Maple, et al., J. Comp. Chem. v15,162-182); also, the DISCOVER (cvff and cff91) and AMBER forcefields areused in the INSIGHT molecular modeling package (Biosym/MSI, San DiegoCalif.) and HARMM is used in the QUANTA molecular modeling package(Biosym/MSI, San Diego Calif.).

In addition, as outlined herein, a preferred method of generating aprobability distribution table is through the use of sequence alignmentprograms. In addition, the probability table can be obtained by acombination of sequence alignments and computational approaches. Forexample, one can add amino acids found in the alignment of homologoussequences to the result of the computation. Preferable one can add thewild type amino acid identity to the probability table if it is notfound in the computation.

As will be appreciated, a GHA protein library created by recombiningvariable positions and/or residues at the variable position may not bein a rank-ordered list. In some embodiments, the entire list may just bemade and tested. Alternatively, in a preferred embodiment, the GHAprotein library is also in the form of a rank ordered list This may bedone for several reasons, including the size of the library is still toobig to generate experimentally, or for predictive purposes. This may bedone in several ways. In one embodiment, the library is ranked using thescoring functions of PDA to rank the library members. Alternatively,statistical methods could be used. For example, the library may beranked by frequency score; that is, proteins containing the most of highfrequency residues could be ranked higher, etc. This may be done byadding or multiplying the frequency at each variable position togenerate a numerical score. Similarly, the library different positionscould be weighted and then the proteins scored; for example, thosecontaining certain residues could be arbitrarily ranked.

In a preferred embodiment, the different protein members of the GHAprotein library may be chemically synthesized. This is particularlyuseful when the designed proteins are short, preferably less than 150amino acids in length, with less than 100 amino acids being preferred,and less than 50 amino acids being particularly preferred, although asis known in the art, longer proteins can be made chemically orenzymatically. See for example Wilken et al, Curr. Opin. Biotechnol.9:412-26 (1998), hereby expressly incorporated by reference.

In a preferred embodiment, particularly for longer proteins or proteinsfor which large samples are desired, the library sequences are used tocreate nucleic acids such as DNA which encode the member sequences andwhich can then be cloned into host cells, expressed and assayed, ifdesired. Thus, nucleic acids, and particularly DNA, can be made whichencodes each member protein sequence. This is done using well knownprocedures. The choice of codons, suitable expression vectors andsuitable host cells will vary depending on a number of factors, and canbe easily optimized as needed.

In a preferred embodiment, multiple PCR reactions with pooledoligonucleotides is done, as is generally depicted in FIG. 8. In thisembodiment, overlapping oligonucleotides are synthesized whichcorrespond to the full length gene. Again, these oligonucleotides mayrepresent all of the different amino acids at each variant position orsubsets.

In a preferred embodiment, these oligonucleotides are pooled in equalproportions and multiple PCR reactions are performed to create fulllength sequences containing the combinations of mutations defined by thelibrary. In addition, this may be done using error-prone PCR methods.

In a preferred embodiment, the different oligonucleotides are added inrelative amounts corresponding to the probability distribution table.The multiple PCR reactions thus result in full length sequences with thedesired combinations of mutations in the desired proportions.

The total number of oligonucleotides needed is a function of the numberof positions being mutated and the number of mutations being consideredat these positions.

(number of oligos for constant positions)+M1+M2+M3+ . . . Mn=(totalnumber of oligos required), where Mn is the number of mutationsconsidered at position n in the sequence.

In a preferred embodiment, each overlapping oligonucleotide comprisesonly one position to be varied; in alternate embodiments, the variantpositions are too close together to allow this and multiple variants peroligonucleotide are used to allow complete recombination of all thepossibilities. That is, each oligo can contain the codon for a singleposition being mutated, or for more than one position being mutated. Themultiple positions being mutated must be close in sequence to preventthe oligo length from being impractical. For multiple mutating positionson an oligonucleotide, particular combinations of mutations can beincluded or excluded in the library by including or excluding theoligonucleotide encoding that combination. For example, as discussedherein, there may be correlations between variable regions; that is,when position X is a certain residue, position Y must (or must not) be aparticular residue. These sets of variable positions are sometimesreferred to herein as a “cluster”. When the clusters are comprised ofresidues close together, and thus can reside on one oligonucleotideprimer, the clusters can be set to the “good” correlations, andeliminate the bad combinations that may decrease the effectiveness ofthe library. However, if the residues of the cluster are far apart insequence, and thus will reside on different oligonucleotides forsynthesis, it may be desirable to either set the residues to the “good”correlation, or eliminate them as variable residues entirely. In analternative embodiment, the library may be generated in several steps,so that the cluster mutations only appear together. This procedure, i.e.the procedure of identifying mutation clusters and either placing themon the same oligonucleotides or eliminating them from the library orlibrary generation in several steps preserving clusters, canconsiderably enrich the experimental library with properly foldedprotein. Identification of clusters can be carried out by a number ofways, e.g. by using known pattern recognition methods, comparisons offrequencies of occurence of mutations or by using energy analysis of thesequences to be experimentally generated (for example, if the energy ofinteraction is high, the positions are correlated). These correlationsmay be positional correlations (e.g. variable positions 1 and 2 alwayschange together or never change together) or sequence correlations (e.g.if there is residue A at position 1, there is always residue B atposition 2). See: Pattern discovery in Biomolecular Data: Tools,Techniques, and Applications; edited by Jason T. L. Wang, Bruce A.Shapiro, Dennis Shasha. New York: Oxford University, 1999; Andrews,Harry C. Introduction to mathematical techniques in pattern recognition;New York, Wiley-lnterscience [1972]; Applications of PatternRecognition; Editor, K. S. Fu. Boca Raton, Fla. CRC Press, 1982; GeneticAlgorithms for Pattern Recognition; edited by Sankar K. Pal, Paul P.Wang. Boca Raton: CRC Press, c1996; Pandya, Abhijit S., Patternrecognition with neural networks in C++/Abhijit S. Pandya, Robert B.Macy. Boca Raton, Fla.: CRC Press, 1996; Handbook of pattern recognition& computer vision/edited by C. H. Chen, L. F. Pau, P. S. P. Wang, 2nded. Singapore; River Edge, N.J.: World Scientific, c1999; Friedman,Introduction to Pattern Recognition: Statistical, Structural, Neural,and Fuzy Logic Approaches; River Edge, N.J.: World Scientific, c1999,Series title: Series in machine perception and artificial intelligence;vol. 32; all of which are expressly incorporated by reference. Inaddition, programs used to search for consensus motifs can be used aswell.

In addition, correlations and shuffling can be fixed or optimized byaltering the design of the oligonucleotides: that is, by deciding wherethe oligonucleotides (primers) start and stop (e.g. where the sequencesare “cut”). The start and stop sites of oligos can be set to maximizethe number of clusters that appear in single oligonucleotides, therebyenriching the library with higher scoring sequences. Differentoligonucleotide start and stop site options can be computationallymodeled and ranked according to number of clusters that are representedon single oligos, or the percentage of the resulting sequencesconsistent with the predicted library of sequences.

The total number of oligonucleotides required increases when multiplemutable positions are encoded by a single oligonucleotide. The annealedregions are the ones that remain constant, i.e. have the sequence of thereference sequence.

Oligonucleotides with insertions or deletions of codons can be used tocreate a library expressing different length proteins. In particularcomputational sequence screening for insertions or deletions can resultin secondary libraries defining different length proteins, which can beexpressed by a library of pooled oligonucleotide of different lengths.

In a preferred embodiment, the GHA library is done by shuffling thefamily (e.g. a set of variants); that is, some set of the top sequences(if a rank-ordered list is used) can be shuffled, either with or withouterror-prone PCR. “Shuffling” in this context means a recombination ofrelated sequences, generally in a random way. It can include “shuffling”as defined and exemplified in U.S. Pat. Nos. 5,830,721; 5,811,238;5,605,793; 5,837,458 and PCT US/19256, all of which are expresslyincorporated by reference in their entirety. This set of sequences canalso be an artificial set; for example, from a probability table (forexample generated using SCMF) or a Monte Carlo set. Similarly, the“family” can be the top 10 and the bottom 10 sequences, the top 100sequence, etc. This may also be done using error-prone PCR.

Thus, in a preferred embodiment, in silico shuffling is done using thecomputational methods described herein. That is, starting with eithertwo libraries or two sequences, random recombinations of the sequencescan be generated and evaluated.

In a preferred embodiment, error-prone PCR is done to generate the GHAlibrary. See U.S. Pat. Nos. 5,605,793, 5,811,238, and 5,830,721, all ofwhich are hereby incorporated by reference. This can be done on theoptimal sequence or on top members of the library, or some otherartificial set or family. In this embodiment, the gene for the optimalsequence found in the computational screen of the primary library can besynthesized. Error prone PCR is then performed on the optimal sequencegene in the presence of oligonucleotides that code for the mutations atthe variant positions of the library (bias oligonucleotides). Theaddition of the oligonucleotides will create a bias favoring theincorporation of the mutations in the library. Alternatively, onlyoligonucleotides for certain mutations may be used to bias the library.

In a preferred embodiment, gene shuffling with error prone PCR can beperformed on the gene for the optimal sequence, in the presence of biasoligonucleotides, to create a DNA sequence library that reflects theproportion of the mutations found in the GHA library. The choice of thebias oligonucleotides can be done in a variety of ways; they can chosenon the basis of their frequency, i.e. oligonucleotides encoding highmutational frequency positions can be used; alternatively,oligonucleotides containing the most variable positions can be used,such that the diversity is increased; if the secondary library isranked, some number of top scoring positions can be used to generatebias oligonucleotides; random positions may be chosen; a few top scoringand a few low scoring ones may be chosen; etc. What is important is togenerate new sequences based on preferred variable positions andsequences.

In a preferred embodiment, PCR using a wild type gene or other gene canbe used, as is schematically depicted in FIG. 9. In this embodiment, astarting gene is used; generally, although this is not required, thegene is usually the wild type gene. In some cases it may be the geneencoding the global optimized sequence, or any other sequence of thelist, or a consensus sequence obtained e.g. from aligning homologoussequences from different organisms. In this embodiment, oligonucleotidesare used that correspond to the variant positions and contain thedifferent amino acids of the library. PCR is done using PCR primers atthe termini, as is known in the art. This provides two benefits; thefirst is that this generally requires fewer oligonucleotides and canresult in fewer errors. In addition, it has experimental advantages inthat if the wild type gene is used, it need not be synthesized.

In addition, there are several other techniques that can be used, asexemplified in FIGS. 9 to 12. In a preferred embodiment, ligation of PCRproducts is done.

In a preferred embodiment, a variety of additional steps may be done tothe GHA protein library; for example, further computational processingcan occur, different GHA protein libraries can be recombined, or cutoffsfrom different libraries can be combined. In a preferred embodiment, anGHA library may be computationally remanipulated to form an additionalGHA protein library (sometimes referred to herein as “tertiarylibraries”). For example, any of the GHA protein library sequences maybe chosen for a second round of PDA, by freezing or fixing some or allof the changed positions in the first library. Alternatively, onlychanges seen in the last probability distribution table are allowed.Alternatively, the stringency of the probability table may be altered,either by increasing or decreasing the cutoff for inclusion. Similarly,the GHA protein library may be recombined experimentally after the firstround; for example, the best gene/genes from the first screen may betaken and gene assembly redone (using techniques outlined below,multiple PCR, error prone PCR, shuffling, etc.). Alternatively, thefragments from one or more good gene(s) to change probabilities at somepositions. This biases the search to an area of sequence space found inthe first round of computational and experimental screening.

In a preferred embodiment, a tertiary library can be generated fromcombining different GHA protein libraries. For example, a probabilitydistribution table from a first GHA protein library can be generated andrecombined, either computationally or experimentally, as outlinedherein. A PDA GHA protein library may be combined with a sequencealignment GHA protein library, and either recombined (again,computationally or experimentally) or just the cutoffs from each joinedto make a new tertiary library. The top sequences from several librariescan be recombined. Sequences from the top of a library can be combinedwith sequences from the bottom of the library to more broadly samplesequence space, or only sequences distant from the top of the librarycan be combined. GHA protein libraries that analyzed different parts ofa protein can be combined to a tertiary library that treats the combinedparts of the protein.

In a preferred embodiment, a tertiary library can be generated usingcorrelations in a GHA protein library. That is, a residue at a firstvariable position may be correlated to a residue at second variableposition (or correlated to residues at additional positions as well).For example, two variable positions may sterically or electrostaticallyinteract, such that if the first residue is X, the second residue mustbe Y. This may be either a positive or negative correlation.

Using the nucleic acids of the present invention which encode a GHAprotein, a variety of expression vectors are made. The expressionvectors may be either self-replicating extrachromosomal vectors orvectors which integrate into a host genome. Generally, these expressionvectors include transcriptional and translational regulatory nucleicacid operably linked to the nucleic acid encoding the GHA protein. Theterm “control sequences” refers to DNA sequences necessary for theexpression of an operably linked coding sequence in a particular hostorganism. The control sequences that are suitable for prokaryotes, forexample, include a promoter, optionally an operator sequence, and aribosome binding site. Eukaryotic cells are known to utilize promoters,polyadenylation signals, and enhancers.

Nucleic acid is operably linked when it is placed into a functionalrelationship with another nucleic acid sequence. For example, DNA for apresequence or secretory leader is operably linked to DNA for apolypeptide if it is expressed as a preprotein that participates in thesecretion of the polypeptide; a promoter or enhancer is operably linkedto a coding sequence if it affects the transcription of the sequence; ora ribosome binding site is operably linked to a coding sequence if it ispositioned so as to facilitate translation.

In a preferred embodiment, when the endogenous secretory sequence leadsto a low level of secretion of the naturally occurring protein or of theGHA protein, a replacement of the naturally occurring secretory leadersequence is desired. In this embodiment, an unrelated secretory leadersequence is operably linked to a GHA protein encoding nucleic acidleading to increased protein secretion. Thus, any secretory leadersequence resulting in enhanced secretion of the GHA protein, whencompared to the secretion of hGH and its secretory sequence, is desired.Suitable secretory leader sequences that lead to the secretion of aprotein are know in the art.

In another preferred embodiment, a secretory leader sequence of anaturally occurring protein or a protein is removed by techniques knownin the art and subsequent expression results in intracellularaccumulation of the recombinant protein.

Generally, “operably linked” means that the DNA sequences being linkedare contiguous, and, in the case of a secretory leader, contiguous andin reading phase. However, enhancers do not have to be contiguous.Linking is accomplished by ligation at convenient restriction sites. Ifsuch sites do not exist, the synthetic oligonucleotide adaptors orlinkers are used in accordance with conventional practice. Thetranscriptional and translational regulatory nucleic acid will generallybe appropriate to the host cell used to express the fusion protein; forexample, transcriptional and translational regulatory nucleic acidsequences from Bacillus are preferably used to express the fusionprotein in Bacillus. Numerous types of appropriate expression vectors,and suitable regulatory sequences are known in the art for a variety ofhost cells.

In general, the transcriptional and translational regulatory sequencesmay include, but are not limited to, promoter sequences, ribosomalbinding sites, transcriptional start and stop sequences, translationalstart and stop sequences, and enhancer or activator sequences. In apreferred embodiment, the regulatory sequences include a promoter andtranscriptional start and stop sequences.

Promoter sequences encode either constitutive or inducible promoters.The promoters may be either naturally occurring promoters or hybridpromoters. Hybrid promoters, which combine elements of more than onepromoter, are also known in the art, and are useful in the presentinvention. In a preferred embodiment, the promoters are strongpromoters, allowing high expression in cells, particularly mammaliancells, such as the CMV promoter, particularly in combination with a Tetregulatory element.

In addition, the expression vector may comprise additional elements. Forexample, the expression vector may have two replication systems, thusallowing it to be maintained in two organisms, for example in mammalianor insect cells for expression and in a prokaryotic host for cloning andamplification. Furthermore, for integrating expression vectors, theexpression vector contains at least one sequence homologous to the hostcell genome, and preferably two homologous sequences which flank theexpression construct. The integrating vector may be directed to aspecific locus in the host cell by selecting the appropriate homologoussequence for inclusion in the vector. Constructs for integrating vectorsare well known in the art.

In addition, in a preferred embodiment, the expression vector contains aselectable marker gene to allow the selection of transformed host cells.Selection genes are well known in the art and will vary with the hostcell used.

A preferred expression vector system is a retroviral vector system suchas is generally described in PCT/US97/01019 and PCT/US97/01048, both ofwhich are hereby expressly incorporated by reference.

The GHA nucleic acids are introduced into the cells either alone or incombination with an expression vector. By “introduced into” orgrammatical equivalents herein is meant that the nucleic acids enter thecells in a manner suitable for subsequent expression of the nucleicacid. The method of introduction is largely dictated by the targetedcell type, discussed below. Exemplary methods include CaPO₄precipitation, liposome fusion, lipofectin®, electroporation, viralinfection, etc. The GHA nucleic acids may stably integrate into thegenome of the host cell (for example, with retroviral introduction,outlined below), or may exist either transiently or stably in thecytoplasm (i.e. through the use of traditional plasmids, utilizingstandard regulatory sequences, selection markers, etc.).

The GHA proteins of the present invention are produced by culturing ahost cell transformed with an expression vector containing nucleic acidencoding a GHA A protein, under the appropriate conditions to induce orcause expression of the GHA protein. The conditions appropriate for GHAprotein expression will vary with the choice of the expression vectorand the host cell, and will be easily ascertained by one skilled in theart through routine experimentation. For example, the use ofconstitutive promoters in the expression vector will require optimizingthe growth and proliferation of the host cell, while the use of aninducible promoter requires the appropriate growth conditions forinduction. In addition, in some embodiments, the timing of the harvestis important. For example, the baculoviral systems used in insect cellexpression are lytic viruses, and thus harvest time selection can becrucial for product yield.

Appropriate host cells include yeast, bacteria, archebacteria, fungi,and insect and animal cells, including mammalian cells. Of particularinterest are Drosophila melangaster cells, Saccharomyces cerevisiae andother yeasts, E. coli, Bacillus subtilis, SF9 cells, C129 cells, 293cells, Neurospora, BHK, CHO, COS, Pichia Pastoris, etc.

In a preferred embodiment, the GHA proteins are expressed in mammaliancells. Mammalian expression systems are also known in the art, andinclude retroviral systems. A mammalian promoter is any DNA sequencecapable of binding mammalian RNA polymerase and initiating thedownstream (3′) transcription of a coding sequence for the fusionprotein into mRNA. A promoter will have a transcription initiatingregion, which is usually placed proximal to the 5′ end of the codingsequence, and a TATA box, using a located 25-30 base pairs upstream ofthe transcription initiation site. The TATA box is thought to direct RNApolymerase II to begin RNA synthesis at the correct site. A mammalianpromoter will also contain an upstream promoter element (enhancerelement), typically located within 100 to 200 base pairs upstream of theTATA box. An upstream promoter element determines the rate at whichtranscription is initiated and can act in either orientation. Ofparticular use as mammalian promoters are the promoters from mammalianviral genes, since the viral genes are often highly expressed and have abroad host range. Examples include the SV40 early promoter, mousemammary tumor virus LTR promoter, adenovirus major late promoter, herpessimplex virus promoter, and the CMV promoter.

Typically, transcription termination and polyadenylation sequencesrecognized by mammalian cells are regulatory regions located 3′ to thetranslation stop codon and thus, together with the promoter elements,flank the coding sequence. The 3′ terminus of the mature mRNA is formedby site-specific post-translational cleavage and polyadenylation.Examples of transcription terminator and polyadenlytion signals includethose derived form SV40.

The methods of introducing exogenous nucleic acid into mammalian hosts,as well as other hosts, is well known in the art, and will vary with thehost cell used. Techniques include dextran-mediated transfection,calcium phosphate precipitation, polybrene mediated transfection,protoplast fusion, electroporation, viral infection, encapsulation ofthe polynucleotide(s) in liposomes, and direct microinjection of the DNAinto nuclei. As outlined herein, a particularly preferred methodutilizes retroviral infection, as outlined in PCT US97/01019,incorporated by reference.

As will be appreciated by those in the art, the type of mammalian cellsused in the present invention can vary widely. Basically, any mammaliancells may be used, with mouse, rat, primate and human cells beingparticularly preferred, although as will be appreciated by those in theart, modifications of the system by pseudotyping allows all eukaryoticcells to be used, preferably higher eukaryotes. As is more fullydescribed below, a screen will be set up such that the cells exhibit aselectable phenotype in the presence of a bioactive peptide. As is morefully described below, cell types implicated in a wide variety ofdisease conditions are particularly useful, so long as a suitable screenmay be designed to allow the selection of cells that exhibit an alteredphenotype as a consequence of the presence of a peptide within the cell.

Accordingly, suitable cell types include, but are not limited to, tumorcells of all types (particularly melanoma, myeloid leukemia, carcinomasof the lung, breast, ovaries, colon, kidney, prostate, pancreas andtestes), cardiomyocytes, endothelial cells, epithelial cells,lymphocytes (T-cell and B cell), mast cells, eosinophils, vascularintimal cells, hepatocytes, leukocytes including mononuclear leukocytes,stem cells such as haemopoetic, neural, skin, lung, kidney, liver andmyocyte stem cells (for use in screening for differentiation andde-differentiation factors), osteoclasts, chondrocytes and otherconnective tissue cells, keratinocytes, melanocytes, liver cells, kidneycells, and adipocytes. Suitable cells also include known research cells,including, but not limited to, Jurkat T cells, NIH3T3 cells, CHO, Cos,etc. See the ATCC cell line catalog, hereby expressly incorporated byreference.

In one embodiment, the cells may be additionally genetically engineered,that is, contain exogeneous nucleic acid other than the GHA nucleicacid.

In a preferred embodiment, the GHA proteins are expressed in bacterialsystems. Bacterial expression systems are well known in the art.

A suitable bacterial promoter is any nucleic acid sequence capable ofbinding bacterial RNA polymerase and initiating the downstream (3′)transcription of the coding sequence of the GHA protein into mRNA. Abacterial promoter has a transcription initiation region which isusually placed proximal to the 5′ end of the coding sequence. Thistranscription initiation region typically includes an RNA polymerasebinding site and a transcription initiation site. Sequences encodingmetabolic pathway enzymes provide particularly useful promotersequences. Examples include promoter sequences derived from sugarmetabolizing enzymes, such as galactose, lactose and maltose, andsequences derived from biosynthetic enzymes such as tryptophan.Promoters from bacteriophage may also be used and are known in the art.In addition, synthetic promoters and hybrid promoters are also useful;for example, the tac promoter is a hybrid of the trp and lac promotersequences. Furthermore, a bacterial promoter can include naturallyoccurring promoters of non-bacterial origin that have the ability tobind bacterial RNA polymerase and initiate transcription.

In addition to a functioning promoter sequence, an efficient ribosomebinding site is desirable. In E. coli, the ribosome binding site iscalled the Shine-Delgarno (SD) sequence and includes an initiation codonand a sequence 3-9 nucleotides in length located 3-11 nucleotidesupstream of the initiation codon.

The expression vector may also include a signal peptide sequence thatprovides for secretion of the GHA protein in bacteria. The signalsequence typically encodes a signal peptide comprised of hydrophobicamino acids which direct the secretion of the protein from the cell, asis well known in the art. The protein is either secreted into the growthmedia (gram-positive bacteria) or into the periplasmic space, locatedbetween the inner and outer membrane of the cell (gram-negativebacteria). For expression in bacteria, usually bacterial secretoryleader sequences, operably linked to a GHA protein encoding nucleicacid, are preferred.

The bacterial expression vector may also include a selectable markergene to allow for the selection of bacterial strains that have beentransformed. Suitable selection genes include genes which render thebacteria resistant to drugs such as ampicillin, chloramphenicol,erythromycin, kanamycin, neomycin and tetracycline. Selectable markersalso include biosynthetic genes, such as those in the histidine,tryptophan and leucine biosynthetic pathways.

These components are assembled into expression vectors. Expressionvectors for bacteria are well known in the art, and include vectors forBacillus subtilis, E. coli, Streptococcus cremoris, and Streptococcuslividans, among others.

The bacterial expression vectors are transformed into bacterial hostcells using techniques well known in the art, such as calcium chloridetreatment, electroporation, and others.

In one embodiment, GHA proteins are produced in insect cells. Expressionvectors for the transformation of insect cells, and in particular,baculovirus-based expression vectors, are well known in the art.

In a preferred embodiment, GHA proteins are produced in yeast cells.Yeast expression systems are well known in the art, and includeexpression vectors for Saccharomyces cerevisiae, Candida albicans and C.maltosa, Hansenula polymorpha, Kluyveromyces fragilis and K. lactis,Pichia guillerimondii and P. pastoris, Schizosaccharomyces pombe, andYarrowia lipolytica. Preferred promoter sequences for expression inyeast include the inducible GAL1, 10 promoter, the promoters fromalcohol dehydrogenase, enolase, glucokinase, glucose-6-phosphateisomerase, glyceraldehyde-3-phosphate-dehydrogenase, hexokinase,phosphofructokinase, 3-phosphoglycerate mutase, pyruvate kinase, and theacid phosphatase gene. Yeast selectable markers include ADE2, HIS4,LEU2, TRP1, and ALG7, which confers resistance to tunicamycin; theneomycin phosphotransferase gene, which confers resistance to G418; andthe CUP1 gene, which allows yeast to grow in the presence of copperions.

In addition, the GHA polypeptides of the invention may be further fusedto other proteins, if desired, for example to increase expression orstabilize the protein.

In one embodiment, the GHA nucleic acids, proteins and antibodies of theinvention are labeled with a label other than the scaffold. By “labeled”herein is meant that a compound has at least one element, isotope orchemical compound attached to enable the detection of the compound. Ingeneral, labels fall into three classes: a) isotopic labels, which maybe radioactive or heavy isotopes; b) immune labels, which may beantibodies or antigens; and c) colored or fluorescent dyes. The labelsmay be incorporated into the compound at any position.

Once made, the GHA proteins may be covalently modified. One type ofcovalent modification includes reacting targeted amino acid residues ofa GHA polypeptide with an organic derivatizing agent that is capable ofreacting with selected side chains or the N- or C-terminal residues of aGHA polypeptide. Derivatization with bifunctional agents is useful, forinstance, for crosslinking a GHA protein to a water-insoluble supportmatrix or surface for use in the method for purifying anti-GHAantibodies or screening assays, as is more fully described below.Commonly used crosslinking agents include, e.g.,1,1-bis(diazoacetyl)-2-phenylethane, glutaraldehyde,N-hydroxysuccinimide esters, for example, esters with 4-azidosalicylicacid, homobifunctional imidoesters, including disuccinimidyl esters suchas 3,3′-dithiobis(succinimidylpropionate), bifunctional maleimides suchas bis-N-maleimido-1,8-octane and agents such asmethyl-3-[(p-azidophenyl)dithio]propioimidate.

Other modifications include deamidation of glutaminyl and asparaginylresidues to the corresponding glutamyl and aspartyl residues,respectively, hydroxylation of proline and lysine, phosphorylation ofhydroxyl groups of seryl or threonyl residues, methylation of the“-amino groups of lysine, arginine, and histidine side chains [T. E.Creighton, Proteins: Structure and Molecular Properties, W.H. Freeman &Co., San Francisco, pp. 79-86 (1983)], acetylation of the N-terminalamine, and amidation of any C-terminal carboxyl group.

Another type of covalent modification of the GHA polypeptide includedwithin the scope of this invention comprises altering the nativeglycosylation pattern of the polypeptide. “Altering the nativeglycosylation pattern” is intended for purposes herein to mean deletingone or more carbohydrate moieties found in native sequence GHApolypeptide, and/or adding one or more glycosylation sites that are notpresent in the native sequence GHA polypeptide.

Addition of glycosylation sites to GHA polypeptides may be accomplishedby altering the amino acid sequence thereof. The alteration may be madefor example, by the addition of, or substitution by, one or more serineor threonine residues to the native sequence GHA polypeptide (forO-linked glycosylation sites). The GHA amino acid sequence mayoptionally be altered through changes at the DNA level, particularly bymutating the DNA encoding the GHA polypeptide at preselected bases suchthat codons are generated that will translate into the desired aminoacids.

Another means of increasing the number of carbohydrate moieties on theGHA polypeptide is by chemical or enzymatic coupling of glycosides tothe polypeptide. Such methods are described in the art, e.g., in WO87/05330 published 11 Sep. 1987, and in Aplin and Wriston, CRC Crit.Rev. Biochem., pp 259-306 (1981).

Removal of carbohydrate moieties present on the GHA polypeptide may beaccomplished chemically or enzymatically or by mutational substitutionof codons encoding for amino acid residues that serve as targets forglycosylation. Chemical deglycosylation techniques are known in the artand described, for instance, by Hakimuddin, et al., Arch. Biochem.Biophys., 259:52 (1987) and by Edge et al., Anal. Biochem., 118:131(1981). Enzymatic cleavage of carbohydrate moieties on polypeptides canbe achieved by the use of a variety of endo- and exo-glycosidases asdescribed by Thotakura et al., Meth. Enzymol., 138:350 (1987).

Such derivatized moieties may improve the solubility, absorption,permeability across the blood brain barrier biological half life, andthe like. Such moieties or modifications of GHA polypeptides mayalternatively eliminate or attenuate any possible undesirable sideeffect of the protein and the like. Moieties capable of mediating sucheffects are disclosed, for example, in Remington's PharmaceuticalSciences, 16th ed., Mack Publishing Co., Easton, Pa. (1980).

Another type of covalent modification of GHA polypeptides compriseslinking the GHA polypeptide to one of a variety of nonproteinaceouspolymers, e.g., polyethylene glycol, polypropylene glycol, orpolyoxyalkylenes, in the manner set forth in U.S. Pat. Nos. 4,640,835;4,496,689; 4,301,144; 4,670,417; 4,791,192 or 4,179,337.

GHA polypeptides of the present invention may also be modified in a wayto form chimeric molecules comprising a GHA polypeptide fused toanother, heterologous polypeptide or amino acid sequence. In oneembodiment, such a chimeric molecule comprises a fusion of a GHApolypeptide with a tag polypeptide which provides an epitope to which ananti-tag antibody can selectively bind. The epitope tag is generallyplaced at the amino-or carboxyl-terminus of the GHA polypeptide. Thepresence of such epitope-tagged forms of a GHA polypeptide can bedetected using an antibody against the tag polypeptide. Also, provisionof the epitope tag enables the GHA polypeptide to be readily purified byaffinity purification using an anti-tag antibody or another type ofaffinity matrix that binds to the epitope tag. In an alternativeembodiment, the chimeric molecule may comprise a fusion of a GHApolypeptide with an immunoglobulin or a particular region of animmunoglobulin. For a bivalent form of the chimeric molecule, such afusion could be to the Fc region of an IgG molecule.

Various tag polypeptides and their respective antibodies are well knownin the art. Examples include poly-histidine (poly-his) orpoly-histidine-glycine (poly-his-gly) tags; the flu HA tag polypeptideand its antibody 12CA5 [Field et al., Mol. Cell. Biol. 8:2159-2165(1988)]; the c-myc tag and the 8F9, 3C7, 6E10, G4, B7 and 9E10antibodies thereto [Evan et al., Molecular and Cellular Biology,5:3610-3616 (1985)]; and the Herpes Simplex virus glycoprotein D (gD)tag and its antibody [Paborsky et al., Protein Engineering, 3(6):547-553(1990)]. Other tag polypeptides include the Flag-peptide [Hopp et al.,BioTechnology 6:1204-1210 (1988)]; the KT3 epitope peptide [Martin etal., Science 255:192-194 (1992)]; tubulin epitopepeptide [Skinner etal., J. Biol. Chem. 266:15163-15166 (1991)]; and the T7 gene 10 proteinpeptide tag [Lutz-Freyermuth et al., Proc. Natl. Acad. Sci. U.S.A.87:6393-6397 (1990)].

In a preferred embodiment, the GHA protein is purified or isolated afterexpression. GHA proteins may be isolated or purified in a variety ofways known to those skilled in the art depending on what othercomponents are present in the sample. Standard purification methodsinclude electrophoretic, molecular, immunological and chromatographictechniques, including ion exchange, hydrophobic, affinity, andreverse-phase HPLC chromatography, and chromatofocusing. For example,the GHA protein may be purified using a standard anti-library antibodycolumn. Ultrafiltration and diafiltration techniques, in conjunctionwith protein concentration, are also useful. For general guidance insuitable purification techniques, see Scopes, R., Protein Purification,Springer-Verlag, N.Y. (1982). The degree of purification necessary willvary depending on the use of the GHA protein. In some instances nopurification will be necessary.

Once made, the GHA proteins and nucleic acids of the invention find usein a number of applications. In a preferred embodiment, the GHA proteinsare administered to a patient to treat an hGH-associated disorder.

By “GH-associated disorder” or “GH responsive disorder” or condition”herein is meant a disorder that can be ameliorated by the administrationof a pharmaceutical composition comprising a GHA protein, including, butnot limited to, dwarfism or growth delay, hypochondroplasia oridiopathic short structure; Turner's syndrome; growth delay in burnedchildren; muscle wasting under conditions, including, but not limited tosurgical stress, renal failure, muscular dystrophy, glucocorticoidadministration or HIV infection; congestive heart failure orcardiovascular drug therapy; bone diseases or osteoporosis; disordersaffecting puberty or reproduction; diffuse gastric bleeding; disordersrelating to general anabolism, including, but not limited topseudoarthrosis, bum therapy, old age cachetic states; breast cancer;Prader-Willi syndrome; obesity; and Russel-Silver syndrome. Includedwithin this definition is the use of a GHA protein in GH replacementtherapies in GH deficient adults; GH therapy in elderly people; woundhealing, including but not limited to stasis ulcers, decubitus ulcers,or diabetic ulcers; post-surgical (trauma) healing process; totalparenteral nutrition (TPN); and the reconstitution of the immune system.

In a preferred embodiment, a therapeutically effective dose of a GHAprotein is administered to a patient in need of treatment. By“therapeutically effective dose” herein is meant a dose that producesthe effects for which it is administered. The exact dose will depend onthe purpose of the treatment, and will be ascertainable by one skilledin the art using known techniques. In a preferred embodiment, dosages ofabout 5 μg/kg are used, administered either intraveneously orsubcutaneously. As is known in the art, adjustments for GHA proteindegradation, systemic versus localized delivery, and rate of newprotease synthesis, as well as the age, body weight, general health,sex, diet, time of administration, drug interaction and the severity ofthe condition may be necessary, and will be ascertainable with routineexperimentation by those skilled in the art.

A “patient” for the purposes of the present invention includes bothhumans and other animals, particularly mammals, and organisms. Thus themethods are applicable to both human therapy and veterinaryapplications. In the preferred embodiment the patient is a mammal, andin the most preferred embodiment the patient is human.

The term “treatment” in the instant invention is meant to includetherapeutic treatment, as well as prophylactic, or suppressive measuresfor the disease or disorder. Thus, for example, in the case of GHtherapy in elderly people, successful administration of a GHA proteinprior to onset of the disease or symptoms of a disease results in“treatment” of the disease. As another example, successfuladministration of a GHA protein after clinical manifestation of thedisease to combat the symptoms of the disease comprises “treatment” ofthe disease. “Treatment” also encompasses administration of a GHAprotein after the appearance of the disease in order to eradicate thedisease. Successful administration of an agent after onset and afterclinical symptoms have developed, with possible abatement of clinicalsymptoms and perhaps amelioration of the disease, comprises “treatment”of the disease.

Those “in need of treatment” include mammals already having the diseaseor disorder, as well as those prone to having the disease or disorder,including those in which the disease or disorder is to be prevented.

In another embodiment, a therapeutically effective dose of a GHAprotein, a GHA gene, or a GHA antibody is administered to a patienthaving a disease involving inappropriate expression of GH. A “diseaseinvolving inappropriate expression of GH” within the scope of thepresent invention is meant to include diseases or disorderscharacterized by an overabundance of GH. This overabundance may be dueto any cause, including, but not limited to, overexpression at themolecular level, prolonged or accumulated appearance at the site ofaction, or increased activity of GH relative to normal. Included withinthis definition are diseases or disorders characterized by a reductionof GH. This reduction may be due to any cause, including, but notlimited to, reduced expression at the molecular level, shortened orreduced appearance at the site of action, or decreased activity of GHrelative to normal. Such an overabundance or reduction of GH can bemeasured relative to normal expression, appearance, or activity of GHaccording to, but not limited to, the assays described and referencedherein.

The administration of the GHA proteins of the present invention,preferably in the form of a sterile aqueous solution, can be done in avariety of ways, including, but not limited to, orally, subcutaneously,intravenously, intranasally, transdermally, intraperitoneally,intramuscularly, intrapulmonary, vaginally, rectally, or intraccularly.In some instances, for example, in the treatment of wounds,inflammation, or multiple sclerosis, the GHA A protein may be directlyapplied as a solution or spray. Depending upon the manner ofintroduction, the pharmaceutical composition may be formulated in avariety of ways. The concentration of the therapeutically active GHAprotein in the formulation may vary from about 0.1 to 100 weight %. Inanother preferred embodiment, the concentration of the GHA protein is inthe range of 0.003 to 1.0 molar, with dosages from 0.03, 0.05, 0.1, 0.2,and 0.3 millimoles per kilogram of body weight being preferred.

The pharmaceutical compositions of the present invention comprise a GHAprotein in a form suitable for administration to a patient. In thepreferred embodiment, the pharmaceutical compositions are in a watersoluble form, such as being present as pharmaceutically acceptablesalts, which is meant to include both acid and base addition salts.“Pharmaceutically acceptable acid addition salt” refers to those saltsthat retain the biological effectiveness of the free bases and that arenot biologically or otherwise undesirable, formed with inorganic acidssuch as hydrochloric acid, hydrobromic acid, sulfuric acid, nitric acid,phosphoric acid and the like, and organic acids such as acetic acid,propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid,malonic acid, succinic acid, fumaric acid, tartaric acid, citric acid,benzoic acid, cinnamic acid, mandelic acid, methanesulfonic acid,ethanesulfonic acid, p-toluenesulfonic acid, salicylic acid and thelike. “Pharmaceutically acceptable base addition salts” include thosederived from inorganic bases such as sodium, potassium, lithium,ammonium, calcium, magnesium, iron, zinc, copper, manganese, aluminumsalts and the like. Particularly preferred are the ammonium, potassium,sodium, calcium, and magnesium salts. Salts derived frompharmaceutically acceptable organic non-toxic bases include salts ofprimary, secondary, and tertiary amines, substituted amines includingnaturally occurring substituted amines, cyclic amines and basic ionexchange resins, such as isopropylamine, trimethylamine, diethylamine,triethylamine, tripropylamine, and ethanolamine.

The pharmaceutical compositions may also include one or more of thefollowing: carrier proteins such as serum albumin; buffers such asNaOAc; fillers such as microcrystalline cellulose, lactose, corn andother starches; binding agents; sweeteners and other flavoring agents,coloring agents; and polyethylene glycol. Additives are well known inthe art, and are used in a variety of formulations.

In addition, in one embodiment, the GHA proteins of the presentinvention are formulated using a process for pharmaceutical compositionsof recombinant GH as described in U.S. Pat. No. 5,612,315 which, hereby,is expressly incorporated in its entirety.

In a further embodiment, the GHA proteins are added in a micellularformulation; see U.S. Pat. No. 5,833,948, hereby expressly incorporatedby reference in its entirety.

Combinations of pharmaceutical compositions may be administered.Moreover, the compositions may be administered in combination with othertherapeutics.

In one embodiment provided herein, antibodies, including but not limitedto monoclonal and polyclonal antibodies, are raised against GHA proteinsusing methods known in the art. In a preferred embodiment, theseanti-GHA antibodies are used for immunotherapy. Thus, methods ofimmunotherapy are provided. By “immunotherapy” is meant treatment of aGH related disorders with an antibody raised against a GHA protein. Asused herein, immunotherapy can be passive or active. Passiveimmunotherapy, as defined herein, is the passive transfer of antibody toa recipient (patient). Active immunization is the induction of antibodyand/or T-cell responses in a recipient (patient). Induction of an immuneresponse can be the consequence of providing the recipient with a GHAprotein antigen to which antibodies are raised. As appreciated by one ofordinary skill in the art, the GHA protein antigen may be provided byinjecting a GHA polypeptide against which antibodies are desired to beraised into a recipient, or contacting the recipient with a GHA proteinencoding nucleic acid, capable of expressing the GHA protein antigen,under conditions for expression of the GHA protein antigen.

In another preferred embodiment, a therapeutic compound is conjugated toan antibody, preferably an anti-GHA protein antibody. The therapeuticcompound may be a cytotoxic agent. In this method, targeting thecytotoxic agent to tumor tissue or cells, results in a reduction in thenumber of afflicted cells, thereby reducing symptoms associated withcancer, and GHA protein related disorders. Cytotoxic agents are numerousand varied and include, but are not limited to, cytotoxic drugs ortoxins or active fragments of such toxins. Suitable toxins and theircorresponding fragments include diptheria A chain, exotoxin A chain,ricin A chain, abrin A chain, curcin, crotin, phenomycin, enomycin andthe like. Cytotoxic agents also include radiochemicals made byconjugating radioisotopes to antibodies raised against cell cycleproteins, or binding of a radionuclide to a chelating agent that hasbeen covalently attached to the antibody.

In a preferred embodiment, GHA proteins are administered as therapeuticagents, and can be formulated as outlined above. Similarly, GHA genes(including both the full-length sequence, partial sequences, orregulatory sequences of the GHA coding regions) can be administered ingene therapy applications, as is known in the art. These GHA genes caninclude antisense applications, either as gene therapy (i.e. forincorporation into the genome) or as antisense compositions, as will beappreciated by those in the art.

In a preferred embodiment, the nucleic acid encoding the GHA proteinsmay also be used in gene therapy. In gene therapy applications, genesare introduced into cells in order to achieve in vivo synthesis of atherapeutically effective genetic product, for example for replacementof a defective gene. “Gene therapy” includes both conventional genetherapy where a lasting effect is achieved by a single treatment, andthe administration of gene therapeutic agents, which involves the onetime or repeated administration of a therapeutically effective DNA ormRNA. Antisense RNAs and DNAs can be used as therapeutic agents forblocking the expression of certain genes in vivo. It has already beenshown that short antisense oligonucleotides can be imported into cellswhere they act as inhibitors, despite their low intracellularconcentrations caused by their restricted uptake by the cell membrane.[Zamecnik et al., Proc. Nati. Acad. Sci. U.S.A. 83:4143-4146 (1986)].The oligonucleotides can be modified to enhance their uptake, e.g. bysubstituting their negatively charged phosphodiester groups by unchargedgroups.

There are a variety of techniques available for introducing nucleicacids into viable cells. The techniques vary depending upon whether thenucleic acid is transferred into cultured cells in vitro, or in vivo inthe cells of the intended host. Techniques suitable for the transfer ofnucleic acid into mammalian cells in vitro include the use of liposomes,electroporation, microinjection, cell fusion, DEAE-dextran, the calciumphosphate precipitation method, etc. The currently preferred in vivogene transfer techniques include transfection with viral (typicallyretroviral) vectors and viral coat protein-liposome mediatedtransfection [Dzau et al., Trends in Biotechnology 11:205-210 (1993)].In some situations it is desirable to provide the nucleic acid sourcewith an agent that targets the target cells, such as an antibodyspecific for a cell surface membrane protein or the target cell, aligand for a receptor on the target cell, etc. Where liposomes areemployed, proteins which bind to a cell surface membrane proteinassociated with endocytosis may be used for targeting and/or tofacilitate uptake, e.g. capsid proteins or fragments thereof tropic fora particular cell type, antibodies for proteins which undergointernalization in cycling, proteins that target intracellularlocalization and enhance intracellular half-life. The technique ofreceptor-mediated endocytosis is described, for example, by Wu et al.,J. Biol. Chem. 262:4429-4432 (1987); and Wagner et al., Proc. Natl.Acad. Sci. U.S.A. 87:3410-3414 (1990). For review of gene marking andgene therapy protocols see Anderson et al., Science 256:808-813 (1992).

In a preferred embodiment, GHA genes are administered as DNA vaccines,either single genes or combinations of GHA genes. Naked DNA vaccines aregenerally known in the art Brower, Nature Biotechnology, 16:1304-1305(1998). Methods for the use of genes as DNA vaccines are well known toone of ordinary skill in the art, and include placing a GHA gene orportion of a GHA gene under the control of a promoter for expression ina patient in need of treatment. The GHA gene used for DNA vaccines canencode full-length GHA proteins, but more preferably encodes portions ofthe GHA proteins including peptides derived from the GHA protein. In apreferred embodiment a patient is immunized with a DNA vaccinecomprising a plurality of nucleotide sequences derived from a GHA gene.Similarly, it is possible to immunize a patient with a plurality of GHAgenes or portions thereof as defined herein. Without being bound bytheory, expression of the polypeptide encoded by the DNA vaccine,cytotoxic T-cells, helper T-cells and antibodies are induced whichrecognize and destroy or eliminate cells expressing GH proteins.

In a preferred embodiment, the DNA vaccines include a gene encoding anadjuvant molecule with the DNA vaccine. Such adjuvant molecules includecytokines that increase the immunogenic response to the GHA polypeptideencoded by the DNA vaccine. Additional or alternative adjuvants areknown to those of ordinary skill in the art and find use in theinvention.

The following examples serve to more fully describe the manner of usingthe above-described invention, as well as to set forth the best modescontemplated for carrying out various aspects of the invention. It isunderstood that these examples in no way serve to limit the true scopeof this invention, but rather are presented for illustrative purposes.All references cited herein are incorporated by reference in theirentirety.

EXAMPLE 1 Design and Characterization of Novel GHA Protiens by PDA

Summary: Sequences for novel growth hormone activity proteins (GHAproteins) were designed by optimizing residues in four regions of theprotein (CORE, BOUNDARY 1, BOUNDARY 2 and CLUSTERED BOUNDARY) usingProtein Design Automation (PDA) as described in WO98/47089, U.S. Ser.Nos. 09/058,459, 09/127,926, 60/104,612, 60/158,700, 09/419,351,60/181,630, 60/186,904, and U.S. patent application, entitled ProteinDesign Automation For Protein Libraries (Filed: Apr. 14, 2000; Inventor:Bassil Dahiyat), and PCT US98/07254, all of which are expresslyincorporated by reference in their entirety. Several core designs werecompleted, with 21-45 residues considered corresponding to 21²⁰-45²⁰sequence possibilities. Residues unexposed to solvent were designed inorder to minimize changes to the molecular surface and to limit thepotential for antigenicity of designed novel protein analogues.

Calculations required from 12-24 hours on 16 Silicon Graphics R10000CPU's. The global optimum sequence from each design was selected forcharacterization. The examples herein describe GHA proteins that havefrom 6-26 amino acid substitutions when compared to the amino acidsequence of the mature hGH (190 amino acid residues)(SEQ ID NO:14).

Computational Protocols

Template Structure Preparation:

The template structure was produced using homology modeling. For thisstudy the crystal structure of human growth hormone as deposited in theBrookhaven Protein Data Bank was used [PDB record 3HHR; de Vos et al.,Science 255(5042):306-12(1992)]. After removing all water and receptormolecules and adding all hydrogen atoms to the protein, this resultingstructure was minimized for 50 steps with conjugate gradient method withDreiding forcefield. Electrostatic forces were turned off. PDA was thenused to redesign four regions of the protein (CORE, BOUNDARY1, BOUNDARY2and CLUSTERED BOUNDARY) with the purpose to create a thermostableprotein with growth hormone activity (GHA protein).

Design Strategies:

Core residues and boundary residues were selected for design sinceoptimization of these positions can improve stability, althoughstabilization has been obtained from modifications at other sites aswell. Core designs also minimize changes to the molecular surface andthus limit the designed protein's potential for antigenicity.

PDA calculations were run on one core sequence (CORE) two boundarysequences (BOUNDARY1 and BOUNDARY2 and a clustered boundary sequence(CLUSTERED BOUNDARY). For details see FIGS. 3A-D and below.

PDA Calculations

If possible, Dead End Elimination (DEE) was run to completion to findthe PDA ground state. This was done for the PDA calculations for theA-chain and B-chain of Core 1, Core 2 and Core 2a, as defined below. Forthe calculation of Core 3, Core 4, Core 5, Core 6 and Core 7, DEE wasaborted after the rotamer sequence space was reduced to less than 10²⁵sequences. The DEE calculation was for all the given Core calculationfollowed by Monte Carlo (MC) minimization and a list of the 1000 lowestenergy sequences was generated.

The PDA calculations for all the designs were run using the a2h1p0rotamer library. This library is based on the backbone-dependent rotamerlibrary of Dunbrack and Karplus (Dunbrack and Karplus, J. Mol. Biol.230(2):543-74 (1993); hereby expressly incorporated by reference) butincludes more rotamers for the aromatic and hydrophobic amino acids; X₁and X₂ angle values of rotamers for all the aromatic amino acids and X₁angle values for all the other hydrophobic amino acids were expanded ±1standard deviation about the mean value reported in the Dunbrack andKarplus library. Typical PDA parameters were used: the van der Waalsscale factor was set to 0.9, the H-bond potential well-depth was set to8.0 kcal/mol, the solvation potential was calculated using type 2solvation with a nonpolar burial energy of 0.048 kcal/mol and a nonpolarexposure multiplication factor of 1.6, and the secondary structure scalefactor was set to 0.0 (secondary structure propensities were notconsidered). Calculations required from 12-24 hours on 16 SiliconGraphics R10000 CPU's.

Monte Carlo Analysis

Monte Carlo analysis of the sequences produced by PDA shows the groundstate (optimal) amino acid and amino acids allowed for each variableposition and their frequencies of occurrence (see FIGS. 4 through 7).

EXAMPLE 2 The Design of the CORE Region

Different PDA calculations were performed for the core region of hGH. Inthese calculations the number of positions included in the PDA designwere varied and the effect of different PDA parameters on the resultingprotein sequences, especially the ground state sequences, was analyzed(see below).

The residues in the structure of hGH were divided into core, boundaryand surface categories. By visual inspection of the structure, thefollowing positions were identified as belonging to the core of theprotein: 6, 10, 13, 17, 20, 24, 27, 28, 31, 36, 44, 54, 55, 58, 73, 75,76, 78, 79, 80, 81, 82, 83, 85, 90, 93, 96, 97, 105, 110, 114, 117, 121,124, 157, 161, 162, 163, 166, 170, 173, 176, 177, 180, and 184.

Numbering of the residues follows the one in the Brookhaven Data Bank. Arotamer group was assigned to each CORE position which allows thisposition to become any hydrophobic residue, i.e., Ala, Val, Leu, Ile,Phe, Tyr, Trp, Met or Ser. In the following PDA design, only the COREresidues were allowed to mutate to any rotamer of the hydrophobic aminoacids. The rest of the protein was treated as a template with fixedcoordinates.

Thus, the following positions/amino acid residues were included in thePDA design of the (see also FIGS. 3A and 4A):

6 10 13 17 20 24 27 28 31 36 44 54 55 58 73 75 76 78 79 80

Leu Phe Ala Ala Leu Ala Thr Tyr Phe Ile Phe Phe Ser Ile Leu Leu Leu IleSer Leu

81 82 83 85 90 93 96 97 105 110 114 117 121 124 157 161 162 163 166 170

Leu Leu Ile Ser Val Leu Val Phe Ala Val Leu Leu Ile Leu Leu Gly Leu LeuPhe Met

173 176 177 180 184

Val Phe Leu Val Ser (SEQ ID NO:14)

An energy cutoff of 50 kacl/mol for the rotamer/template was used toexclude unfavorable rotomers. The van der Waals radius was scaled by afactor of 0.9 and the salvation model 2 was used as defined by Streetand Mayo [Fold. Des. 3(4):253-8 (1998)]. Distance-dependenteletrostatics with a dielectric constant 40 was used. The otherparameters were as follows: hydrogen bond well depth energy—8 kcal/mol;non-polar burial penalty energy—0.048 kcal/mol/A2; non-polar exposuremultiplication factor—1.0; polar burial penalty energy—0.0 kcal/mol/A2;polar hydrogen burial penalty energy—2 kcal/mol, amino acid typedependent entropy penalties were used to account for entropiccontribution to the free energy of unfolding.

The parameters are obtained by summing up the side-chain entropy scaleby Pickett & Sternberg [J. Mol. Biol. 213:825-839 (1993)] and thebackbone scale by Stites & Pranata [J. Proteins 22:132-140 (1995)], andreferencing them to gly (−1.92 kcavmol), i.e., assuming that for glycinethe backbone entropy change associated with the unfolding of an α-helixis 6.51 cal/K=1.92 kcal/mol at 295 C [D'Aquino et al., Proteins25:143-156 (1996)], and weighing by a factor of 2.3 obtained throughminimization of the number of mutations when redesigning 45 coreresidues of hGH. The actual penalties (kcal/mol) are as follows: Ala,2.7931; Cys, 5.0256; Asp, 6.6208; Glu, 7.1096; Phe, 5.0256; Gly, 4.4325;His, 6.1614; Hsp, 6.1614; Ile, 5.1094; Lys, 7.9331; Leu, 4.9429; Met,6.9433; Asn, 7.6298; Pro, 2.5400; Gln, 8.1833; Arg, 7.9331; Ser 7.6896;Thr, 7.5114; Val, 4.2845; Trp 5.6374; Tyr 5.9183.

The best energy rotamer sequence was extracted from all possible rotamersequences using the Dead End Elimination (DEE) method. In order toobtain other low energy sequences a Monte Carlo Search was performedstarting from the DEE solution.

The PDA of the hGH CORE resulted in the following DEE ground statesequence (SEQ ID NO:3):

6 10 13 17 20 24 27 28 31 36 44 54 55 58 73 75 76 78 79 80

Leu Phe Val Ala Leu Ala Val Phe Phe Ile Phe Tyr Ala Ile Leu Leu Leu IleAla Leu

81 82 83 85 90 93 96 97 105 110 114 117 121 124 157 161 162 163 166 170

Leu Leu Ile Ala Ile Leu Val Phe Ala Val Met Leu Ile Leu Leu Met Leu LeuPhe Met

173 176 177 180 184

Val Phe Leu Val Ala (SEQ ID NO:3)

This sequence shows 11 mutations from the wild type hGH sequence, A13V,T27V, Y28F, 55A, S79A, S85A, V90I, L114M, G161M, and S184A (see alsoFIG. 4B)(SEQ ID NO:3).

Other sequences, such as COREDESIGN1, COREDESIGN2, and COREDESIGN3 canbe derived:

COREDESIGN1: A13V, T27V, S79A, V90I, G161M, and S184A (see also FIG. 4C)(SEQ ID NO. 4).

6 10 13 17 20 24 27 28 31 36 44 54 55 58 73 75 76 78 79 80

Leu Phe Val Ala Leu Ala Val Tyr Phe Ile Phe Phe Ser Ile Leu Leu Leu IleAla Leu

81 82 83 85 90 93 96 97 105 110 114 117 121 124 157 161 162 163 166 170

Leu Leu Ile Ser Ile Leu Val Phe Ala Val Leu Leu Ile Leu Leu Met Leu LeuPhe Met

173 176 177 180 184

Val Phe Leu Val Ala (SEQ ID NO:4)

COREDESIGN2: A13V, T27V, S55A, S79A, S85A, V90I, G161M, and S184A (seealso FIG. 4D) (SEQ ID NO:5).

6 10 13 17 20 24 27 28 31 36 44 54 55 58 73 75 76 78 79 80

Leu Phe Val Ala Leu Ala Val Tyr Phe Ile Phe Phe Ala Ile Leu Leu Leu IleAla Leu

81 82 83 85 90 93 96 97 105 110 114 117 121 124 157 161 162 163 166 170

Leu Leu Ile Ala Ile Leu Val Phe Ala Val Leu Leu Ile Leu Leu Met Leu LeuPhe Met

173 176 177 180 184

Val Phe Leu Val Ala (SEQ ID NO:5)

COREDESIGN3: A13V, T27V, Y28F, F54Y, S55A, S79A, S85A, V90I, G161M, andS184A (see also FIG. 4E) (SEQ ID NO:6).

6 10 13 17 20 24 27 28 31 36 44 54 55 58 73 75 76 78 79 80

Leu Phe Val Ala Leu Ala Val Phe Phe Ile Phe Tyr Ala Ile Leu Leu Leu IleAla Leu

81 82 83 85 90 93 96 97 105 110 114 117 121 124 157 161 162 163 166 170

Leu Leu Ile Ala Ile Leu Val Phe Ala Val Leu Leu Ile Leu Leu Met Leu LeuPhe Met

173 176 177 180 184

Val Phe Leu Val Ala (SEQ ID NO:6).

Using Monte Carlo technique a list of low energy sequences was generatedfor the CORE. The analysis of the lowest 1000 protein sequencesgenerated by Monte Carlo leads to the mutation pattern shown in FIG. 4A.Thus, any protein sequence showing mutations at the positions accordingto FIG. 4A will potentially generate a more stable and active GHAprotein. In particular those protein sequences found among the list ofthe lowest 50 MC generated sequences (data not shown) have a highpotential to result in a more stable and active GHA protein. PreferredGHA protein sequences are shown in FIGS. 4B to 4E (SEQ ID NOS:3-6).

EXAMPLE 3 The Design of the BOUNDARY1 and BOUNDARY2 Regions

Two sets of boundary residues were selected approximately uniformlydistributed in the structure:

Positions for PDA analysis of BOUNDARY 1: 6, 14, 26, 30, 32, 34, 35, 40,50, 56, 57, 59, 66, 71, 74, 92, 107, 109, 113, 118, 125, 130, 139, 143,157, 158, and 183.

Thus, the following positions/amino acid residues were included in thePDA design of BOUNDARY1 (see also FIG. 3B):

6 14 26 30 32 34 35 40 50 56 57 59 66 71 74 84 92 107 109 113

Leu Met Asp Glu Glu Ala Tyr Gln Thr Glu Ser Pro Glu Ser Glu Gln Phe AspAsn Leu

118 125 130 139 143 157 158 183

Glu Met Asp Phe Tyr Leu Lys Arg (SEQ ID NO:14)

Positions for PDA analysis of BOUNDARY 2: 7, 29, 43, 70, 77, 87, 98,100, 102, 104, 106, 111, 115, 132, 137, 140, 141, 142, 156, 159, 161,184, 185, and 188.

Thus, the following positions/amino acid residues were included in thePDA design of BOUNDARY2 (see also FIG. 3C):

7 29 43 70 77 87 98 100 102 104 106 111 115 132 137 140 141 142 156 159

Ser Gln Ser Lys Arg Leu Ala Ser Val Gly Ser Tyr Lys Ser Ala Lys Ala AlaLeu Asn

161 184 185 188

Gly Ser Val Ser (SEQ ID NO:14)

The above listed PDA sequence includes A137, A141, and A142, instead ofQ137, Q141, and T142 of hGH. The following amino acid residues weremodeled as Ala in the original x-ray structure (PDB entry 3HHR): T135,Q137, I138, Q141, T142, S144, K145, and D147. In the PDA designs, thesepositions were also kept as alanines.

The selections were obtained by filtering out the residues whoserelative solvent accessible surface is less than 10% or more than 50%and the residues which are closer then 5 Å to any atoms of the receptormolecules in the 3HHR structure and by visual inspection.

Numbering of the residues follows the one in the Brookhaven Data Bank. Arotamer group was assigned to each position which allows this positionto become any of the following residues: Ala, Val, Leu, Ile, Phe, Tyr,Trp, Asp, Asn, Glu, Gln, Lys, Ser, Thr, Hsp, Arg, Met, His, or Gly. Inthe following PDA design, the residues in each set were allowed tomutate to any rotamer of the above listed amino acids. The rest of theprotein was treated as a template with fixed coordinates. The twoboundary sets were designed independently, i.e., while designing oneset, the other set was included in the template.

An energy cutoff of 50 kcal/mol for the rotamer/template was used toexclude unfavorable rotamers. The van der Waals radius was scaled by afactor of 0.9 and the solvation model 2 was used as defined by Streetand Mayo [Fold. Des. 3(4):253-8 (1998)]. Distance-independentelectrostatics with a dielectric constant of 8 for BOUNDARY1 and 13 forBOUNDARY2 regions was used. The other parameters were as follows:hydrogen bond well depth energy—8 kacl/mol; non-polar burial penaltyenergy—0.048 kcal/mol/VA2; non-polar exposure multiplication factor—1.6;polar burial penalty energy—0.1125 kcal/mol/VA2; polar hydrogen burialpenalty energy—0 kcal/mol; amino acid type dependent entropy penaltieswere used to account for entropic contribution to the free energy ofunfolding.

The best energy rotamer sequence was extracted from all possible rotamersequences using the Dead End Elimination (DEE) method. In order toobtain other low energy sequences a Monte Carlo search was performedstarting from the DEE solution.

The PDA of hGH BOUNDARY1 resulted in the following DEE ground statesequence (SEQ ID NO:7).

103331 6 14 26 30 32 34 35 40 50 55 57 59 66 71 74 84 92 107 109 113

Leu Leu Ala Val Glu Trp Tyr Lys Phe Glu Glu Val Glu His Glu Arg Glu AlaPhe Leu

118 125 130 139 143 157 158 183

Leu Ile Arg His Asp Leu Lys His (SEQ ID NO:7)

The energy is −29.62 kacl/mol. This sequence shows 19 mutations from thewild type hGH sequence, M14L, D26A, E30V, A34W, Q40K, T50F, S57E, P59V,S71H, Q84R, F92E, D107A, N109F, E118L, M1251, D130R, F139H, Y143D, andR183H (see also FIG. 5B)(SEQ ID NO:7).

The lowest energy sequence from the Monte Carlo calculation forBOUNDARY1 is as follows (SEQ ID NO:8):

6 14 26 30 32 34 35 40 50 56 57 59 66 71 74 84 92 107 109 113

Leu Leu Ala Trp Glu Lys Glu Lys Phe Glu Lys Glu Glu His Glu Arg Arg AspPhe Leu

118 125 130 139 143 157 158 183

Leu Ile Arg His Asp Leu Phe His (SEQ ID NO:8)

The energy is −37.135 kcal/mol. This sequence shows 20 mutations fromthe wild type hGH sequence, M14L, D26A, E30W, A34K, Y35E, Q40K, T50F,S57K, P59E, S71H, Q84R, F92R, N109F, E111L, M125I, D130R, F139H, Y143D,K158F, and R183H (see also FIG. 5C)(SEQ ID NO:8).

Using Monte Carlo technique a list of low energy sequences was generatedfor BOUNDARY1. The analysis of the lowest 1000 protein sequencesgenerated by Monte Carlo leads to the mutation pattern shown in FIG. 5A.Thus, any protein sequence showing mutations at the positions accordingto FIG. 5A will potentially generate a more stable and active GHAprotein. In particular those protein sequences found among the list ofthe lowest 50 MC generated sequences (data not shown) have a highpotential to result in a more stable and active GHA protein. PreferredGHA protein sequences are shown in FIGS. 5B and 5C (SEQ ID NOS:7-8).

The PDA analysis of hGH BOUNDARY2 resulted in the following DEE groundstate sequence (SEQ ID NO:9):

7 29 43 70 77 87 98 100 102 104 106 111 115 132 137 140 141 142 156 159

Lys Lys Lys Lys Met Leu Val Ala Val Gly Lys Arg Lys Ala Trp Lys Lys ValLeu Phe

161 184 185 188

Met Ala Val Ala (SEQ ID NO:9)

The energy is 16.894 kcal/mol. This sequence shows 16 mutations from thewild type hGH sequence, S7K, Q29K, S43K, R77M, A98V, S100A, S106K,Y111R, S132A, A137W, A141K, A142V, G161M, S184A, and S188A (see alsoFIG. 6B)(SEQ ID NO:9).

Using Monte Carlo technique a list of low energy sequences was generatedfor BOUNDARY2. The analysis of the lowest 1000 protein sequencesgenerated by Monte Carlo leads to the mutation pattern shown in FIG. 6A.Thus, any protein sequence showing mutations at the positions accordingto FIG. 6A will potentially generate a more stable and active GHAprotein. In particular those protein sequences found among the list ofthe lowest 50 MC generated sequences (data not shown) have a highpotential to result in a more stable and active GHA protein. A preferredGHA protein sequence is shown in FIG. 6B (SEQ ID NO:9).

EXAMPLE 4 The Design of the CLUSTERED BOUNDARY Region

In order to simplify experimental expression of GHA proteins anothertype of boundary design was performed. 21 residues clustered in threegroups were chosen out of all residues classified as boundary. This setis called here CLUSTERED BOUNDARY region,

Positions for PDA analysis of CLUSTERED BOUNDARY region: 26, 29, 30, 34,40, 43, 50, 77, 84, 92, 100, 102, 111, 118, 125, 132, 137, 139, 141,142, and 143.

All the other boundary residues were allowed to “float” duringcalculations, i.e., choose rotamers of the wild type amino acid. Theabove listed PDA sequence includes A137, A141, and A142, instead ofQ137, Q141, and T142 of hGH. The following amino acid residues weremodeled as Ala in the original x-ray structure (PDB entry 3HHR): T135,Q137, I138, Q141, T142, S144, K145, and D147. In this PDA design, threeof these residues (position Q137, Q141, and T142) were taken intocalculations. The others were forced to “float” and keep the wild typeidentity with the exception of 145, which was kept as alanine.

Thus, the following positions/amino acid residues were included in thePDA design of CLUSTERED BOUNDARY (see also FIG. 3D.):

7 14 26 29 30 34 40 43 50 57 70 77 84 87 92 98 100 102 104 106

Ser Met Asp Gln Glu Ala Gln Ser Thr Ser Lys Arg Gln Leu Phe Ala Ser ValGly Ser

109 111 115 118 125 132 135 137 138 140 141 142 143 144 145 147 156 159161 184

Asn Tyr Lys Glu Met Ser Thr Gln Ile Lys Gln Thr Tyr Ser Lys Asp Leu AsnGly Ser

185 188

Val Ser (SEQ ID NO:1)

The of calculation parameters were as follows: the cutoff for therotamer/template energy was 50 kcal/mol; the van der Waals radius wasscaled by a factor of 0.9; distance-independent dielectric constant was10.5; the salvation model 2 was used; hydrogen bond well depth energywas 8 kcal/mol; non-polar burial penalty energy was 0.048 kcal/mol/VA2;non-polar exposure multiplication factor was 1.6; polar burial penaltyenergy was 0.144 kcal/mol/A2; polar hydrogen burial penalty energy was 0kcal/mol; amino acid type dependent entropy penalties were used toaccount for entropic contribution to the free energy of unfolding.

The best energy rotamer sequence was extracted from all possible rotamersequences using the Dead End Elimination (DEE) method In order to obtainother low energy sequences a Monte Carlo search was performed startingfrom the DEE solution.

The PDA of hGH CLUSTERED BOUNDARY resulted in the following DEE groundstate sequence (SEQ ID NO:10):

7 14 26 29 30 34 40 43 50 57 70 77 84 87 92 98 100 102 104 106

Ser Met Lys Ile Val Trp Val Lys Phe Ser Lys Met Met Leu Val Ala Ala IleGly Ser

109 111 115 118 125 132 135 137 138 140 141 142 143 144 145 147 156 159161 184

Phe Arg Lys Met Ile Ala Thr Arg Ile Lys Phe Val Val Ser Ala Asp Leu AsnGly Ser

185 188

Val Ser (SEQ ID NO:10)

This sequence shows 22 mutations from the wild type hGH sequence, D26K,Q291, E30V, Q40V, S43K, T50F, R77M, Q84M, F92V, S100A, V102I, N109F,Y111R, E118M, M125I, S132A, Q137R, Q141F, T142V, Y143V, and K145A (seealso FIG. 7B)(SEQ ID NO:10).

Other sequences such as BOUNDARYDESIGN1, BOUNDARYDESIGN2, andBOUNDARYDESIGN3 can be derived.

BOUNDARYDESIGN1 (SEQ ID NO:11):

7 14 26 29 30 34 40 43 50 57 70 77 84 87 92 98 100 102 104 106

Ser Met Lys Ile Val Trp Trp Trp Phe Ser Lys Met Met Leu Val Ala Ala IleGly Ser

109 111 115 118 125 132 135 137 138 140 141 142 143 144 145 147 156 159161 184

Phe Arg Lys Met Ile Ala Ala Arg Ala Lys Phe Val Val Ala Ala Ala Leu AsnGly Ser

185 188

Val Ser (SEQ ID NO:11)

This sequence shows 26 mutations from the wild type hGH sequence, D26K,Q29I, E30V, Q40W, S43W, T50F, R77M, Q84M, F92V, S100A, V102I, N109F,Y111R, E118M, M125I, S132A, T135A, Q137R, 1138A, Q141F, T142V, Y143V,S144A, K145A, and D147A (see also FIG. 7C)(SEQ ID NO:11).

BOUNDARYDESIGN2 (SEQ ID NO:12):

7 14 26 29 30 34 40 43 50 57 70 77 84 87 92 98 100 102 104 106

Ser Met Lys Ile Val Trp Val Lys Phe Ser Lys Met Met Leu Val Ala Ala IleGly Ser

109 111 115 118 125 132 135 137 138 140 141 142 143 144 145 147 156 159161 184

Phe Arg Lys Met Ile Ala Ala Arg Ala Lys Phe Val Val Ala Ala Ala Leu AsnGly Ser

185 188

Val Ser (SEQ ID NO:12)

This sequence shows 26 mutations from the wild type hGH sequence, D26K,Q291, E30V, Q40V, S43K, T50F, R77M, Q84M, F92V, S100A. V102I, N109F,Y111R, E118M, M1251, S132A, T135A, Q137R, 1138A, Q141 F, T142V, Y143V,S144A, K145A, and D147A (see also FIG. 7D)(SEQ ID NO:12).

BOUNDARYDESIGN3 (SEQ ID NO:13):

7 14 26 29 30 34 40 43 50 57 70 77 84 87 92 98 100 102 104 106

Ser Met Glu Lys Val Trp Val Lys Phe Ser Lys Met Met Leu Val Ala Ala ValGly Ser

109 111 115 118 125 132 135 137 138 140 141 142 143 144 145 147 156 159161 184

Phe Arg Lys Lys Ile Ala Ala Arg Ala Lys Phe Val Val Ala Ala Ala Leu AsnGly Ser

185 188

Val Ser (SEQ ID NO:13)

This sequence shows 25 mutations from the wild type hGH sequence, D26E,Q29K, E30V, Q40V, S43K, T50F, R77M, Q84M, F92V, S100A, N109F, Y111R,E118K, M125I, S132A, T135A, Q137R, I138A, Q141F, T142V, Y143V, S144A,K145A, and D147A (see also FIG. 7E) (SEQ ID NO:13).

Using Monte Carlo technique a list of low energy sequences was generatedfor CLUSTERED BOUNDARY. The analysis of the lowest 1000 proteinsequences generated by Monte Carlo leads to the mutation pattern shownin FIG. 7A. Thus, any protein sequence showing mutations at thepositions according to FIG. 7A will potentially generate a more stableand active GHA protein. In particular those protein sequences foundamong the list of the lowest 50 MC generated sequences (data not shown)have a high potential to result in a more stable and active GHA protein.Preferred GHA protein sequences are shown in FIGS. 7B and 7E (SEQ IDNOS:10, 13).

EXAMPLE 5 HGA Protein Expression and Refolding

HGA proteins of the invention were expressed in E.coli using standardprotocols (e.g., see Shambrook et al., supra, Ausubel et al., supra).Inclusion bodies were prepared as known in the art. Approximately 0.5 gof wet inclusion bodies were dissolved in 5 ml of wash buffer A (100 mMTris/HCl, pH 8.0; 2% triton; 4M urea; 5 mM EDTA; 0.5 mM DTT), mixed,vortexed and centrifuged at 20,000 g for 30 min. The pellet was washedin buffer B (100 mM Tris/HCl, pH 8.0; 0.5 mM DTT), mixed, vortexed andcentrifuged at 20,000 g for 30 min. The pellet was resuspended inextraction buffer (50 mM glycine; 0.0156 M NaOH, 5 mM reduced GSH; 8MGdnHCl, pH 9.6) at 3 ml/g pellet. The proteins were dispersed bysonication (tip midway in solution; output control: 7; duty cycle: 80%;10 second pulses on ice). The sample was centrifuged (20,000 g for 30min) to get rid of pellet debris. The supernatant was analyzed forprotein concentration and if necessary adjusted to 2 mg protein/ml. Thesupernatant is dialyzed for 12-16 hours or over night against foldingbuffer A (50 mM glycine; 0.0156 M NaOH; 10% sucrose; 1 mM EDTA; 1 mMreduced GSH; 0.1 mM oxidized GSSG; 4 M urea, pH 9.6). After dialyzingagainst folding buffer B (60 mM Tris, pH 9.6; 10% sucrose; 1 mM EDTA;0.1 mM reduced GSH; 0.01 mM oxidized GSSG) the supernatant was filteredand purified further by column chromatography (HPLC-SE).

EXAMPLE 6 Thermal Stability of HGA Proteins

HGA proteins expressed and purified as described herein were analyzedfor thermal stability and compared to hGH. HGA proteins, mutant b (A13V,T27V, S79A, V9I, G161M, and S184A) (SEQ ID NO:4), mutant d (A13V, T27V,S55A, S79A, S85A, V90I, G161M, and S184A) (SEQ ID NO:5) and mutant f(A13V, T27V, Y28F, F54Y, S55A, S79A, S85A, V90I, G161M, and S184A) (SEQID NO:6) were compared to hGH.

The far-ultraviolet (UV) CD spectra for hGH and the HGA proteins,mutants b, d, and f, were nearly identical to each other, indicatinghighly similar secondary structure and tertiary folds (data not shown).Thermal denaturation was monitored at 220 nm, and the meltingtemperatures (T_(m)·s) were derived from derivative curve of theellipticity at 220 nm vs. temperature. GHA protein mutant b (SEQ IDNO:4) showed an increase in stability of 16° C. and GHA protein mutant d(SEQ ID NO:5) of 13° C. (FIG. 13).

EXAMPLE 7 Cell Proliferation Assays

Cell proliferation assays were performed using an interleukin 3-dependent murine myeloid cell line FDC-P1 stably transfected with thefull-length human growth hormone receptor (hGHR) and according to themethod of Rowlinson et al. [J. Biol. Chem. 270(28)_(—)16833-16839(1995); Endocrinology 137(1):90-5 (1996)]. Cells were maintained inRPMI-1640 medium with 5% fetal calfserum (FCS), 1 μg/ml gentamicin, 50units/ml interleukin 3 (IL-3). In preparation for the assay,exponentially growing cells were washed twice in PBS and resuspended inhGH-free and phenol red-free RPMI-1640 media with 5% FCS and 1 μg/mlgentamicin. Cells were then added to microtiter plates containing serialdiluted hGH or GHA proteins for a final concentration of 2.6×10⁵cells/well. The plates were incubated at 37° C. in 5% CO₂. After 24hours, cell growth was quantified by the reduction of tetrazolium salt(MTT assay). All GHA protein mutants were assayed in triplicate on thesame plate and the assay was repeated three times. EC₅₀ values weredetermined as described by Young et al. [Protein Science 6:1228-1236(1997)] using Kaleidagraph (Synergy Software) by nonlinear least-squaresfit to four parameter equation:OD=OD _(max)−(OD _(max) −OD _(min))/(1+([concentration]/EC ₅₀))^(n)

The Activity of hGH and GHA protein mutants b, d, and f (SEQ ID NOS:4-6)was determined using the assay described above. The ED50 values are asfollows: wild type hGH (220±20); GHA mutant b (320±30); GHA proteinmutant d (260±50); and GHA protein mutant b (230±50).

1. A non-naturally occurring growth hormone activity (GHA) proteincomprising eleven amino acid substitutions as compared to the hGHprotein of SEQ ID NO: 14, said substitutions comprising A13V, T27V,Y28F, F54Y, S55A, S79A, S85A, V90I, L114M, G161M, and S184A, andcomprising the amino acid sequence of SEQ ID NO:
 3. 2. A non-naturallyoccurring GHA protein comprising ten amino acid substitutions ascompared to the hGH protein of SEQ ID NO: 14, said amino acidsubstitutions comprising A13V, T27V, Y28F, F54Y, S55A, S79A, S85A, V90I,G161M, and S184A, and comprising the amino acid sequence of SEQ ID NO:9.
 3. A non-naturally occurring GHA protein comprising eight amino acidsubstitutions as compared to the hGH protein of SEQ ID NO: 14, saidamino acid substitutions comprising A13V, T27V, S55A, S79A, S85A, V90I,G161M, and S184A, and comprising the amino acid sequence of SEQ ID NO:5.
 4. A non-naturally occurring GHA protein comprising six amino acidsubstitutions as compared to the hGH protein of SEQ ID NO: 14, saidamino acid substitutions comprising A13V, T27V, S79A, V90I, G161M, andS184A, and comprising the amino acid sequence of SEQ ID NO:
 4. 5. Arecombinant nucleic acid encoding the non-naturally occurring GHAprotein of claim 1, 2, 3, or
 4. 6. An expression vector comprising therecombinant nucleic acid of claim
 5. 7. A host cell comprising therecombinant nucleic acid of claim
 5. 8. A host cell comprising theexpression vector of claim
 6. 9. A method of producing a non-naturallyoccurring GHA protein comprising culturing the host cell of claim 8under conditions suitable for expression of said nucleic acid.
 10. Amethod according to claim 9, further comprising recovering said GHAprotein.
 11. A pharmaceutical composition comprising a GHA proteinaccording to claim 1, 2, 3, or 4 and a pharmaceutical carrier.